Spectre fix impact on sorting performance
One of the most famous stackoverflow questions is why sorting a sorted array is so fast; and the answer is because of branch prediction.
Will the application of Intel's and Microsoft spectre fixes effectively nullify the answer given in this question on the affected processors (older generation Intel processors, AMD Ryzen, and ARM)?
No, the key to Spectre is forcing mis-prediction of indirect branches, because they can jump to any address. It's non-trivial to find a sequence of instructions that loads secret data you want, and then makes another data-dependent load with the secret as an array index.
To attack a regular taken / not-taken conditional branch (like you'd find in a sort function, or that conditional in the loop over a sorted or not-sorted array), you'd need to find a case where executing the "wrong" side of a branch (maybe the wrong side of an if/else in the source) would do something useful when it runs with the "wrong" values in registers. It's plausible1, but unlikely, so most defenses against Spectre will only worry about indirect branches.
Hardware fixes for Spectre will have to be more subtle than "turn off branch prediction" (i.e. stall the pipeline at every conditional branch). That would probably reduce performance by an order of magnitude in a lot of code, and is far too high to be an acceptable defense against a local information leak (which can lead to privilege escalation).
Even turning off prediction for only indirect branches (but not regular conditional branches) may be too expensive for most user-space code, because every shared library / DLL function call goes through an indirect branch in the normal software ecosystem on mainstream OSes (Linux, OS X, Windows).
The Linux kernel is experimenting with a retpoline to defeat indirect-branch prediction for indirect branches inside the kernel. I'm not sure it's enabled by default, though, even in kernels that enable the Meltdown workaround (KPTI).
- Sometimes the wrong
switchcould do something totally inappropriate (e.g. in an interpreter), and if the
switchwas compiled with nested branches rather than a single indirect branch then you might be able to attack it. (Compilers often use a table of branch targets for
switch, but when the cases are sparse it's not always possible. e.g.
defaultwould need a 990-entry array with only 3 used values.)
- Sometimes the wrong