Most web sites out there tell you to use the concurrency primitives provided by your OS because this stuff is hard to understand. That's just not useful advice for RAMCloud, since we care so much about performance (and we're willing to go through as much pain as necessary to get it).
We assume the processor may reorder instructions and delay stores indefinitely unless told otherwise.
We assume the compiler may reorder instructions or remove them altogether for efficiency unless told otherwise.
A correct concurrency primitive must account for both of these issues.
Memory fences: Mfence
The two keywords behave the same. The keyword asm is not available in ISO C programs, so if you want compatibility with those, you should use the alternate keyword __asm__. See Alternate Keywords in the GCC manual for details.
Since RAMCloud is written in C++, we should use asm rather than the uglier __asm__.
This otherwise useful howto (GCC-Inline-Assembly-HOWTO) claims the following:
However, this contradicts the gcc manual, which clearly states that the volatile keyword on asm statements will not stop the compiler from moving the asm instructions, including across jump instructions (see Extended Asm in the GCC manual). |