Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 19 Next »

Most web sites out there tell you to use the concurrency primitives provided by your OS because this stuff is hard to understand. That's just not useful advice for RAMCloud, since we care so much about performance (and we're willing to go through as much pain as necessary to get it).

Main Issues

We assume the processor may reorder instructions and delay stores indefinitely unless told otherwise.

We assume the compiler may reorder instructions or remove them altogether for efficiency unless told otherwise.

A correct concurrency primitive must account for both of these issues.

Processor Tools

Memory fences: Mfence

Compiler Tools

Inline assembly

asm vs __asm__

The two keywords behave the same. The keyword asm is not available in ISO C programs, so if you want compatibility with those, you should use the alternate keyword __asm__. See Alternate Keywords in the GCC manual for details.

Since RAMCloud is written in C++, we should use asm rather than the uglier __asm__.

volatile vs __volatile__

The volatile keyword on asm statements keeps gcc from deleting those asm statements as useless code. You must inform gcc of all side effects of your asm statements. If your asm statements have important side effects that can't be expressed as output operands (or you don't use the output operands but still want the asm instructions to execute), you should use the volatile keyword. This keeps gcc from deleting the asm block as part of its optimization passes. From Extended Asm in the GCC manual:

The volatile keyword indicates that the instruction has important side-effects. GCC will not delete a volatile asm if it is reachable. (The instruction can still be deleted if GCC can prove that control-flow will never reach the location of the instruction.)

Volatile does not, however, prevent GCC from moving the asm instruction. From Extended Asm in the GCC manual:

Note that even a volatile asm instruction can be moved relative to other code, including across jump instructions.

The gcc manual is still unsatisfying in this regard. What can asm volatile statements be reordered with? If all the compiler knows is that running the asm is important, what does it assume about where the asm must run? The manual suggests it's not entirely conservative.

Warning: Misinformation on Sandeep.S's GCC-Inline-Assembly-HOWTO

This otherwise useful howto (GCC-Inline-Assembly-HOWTO) claims the following:

If our assembly statement must execute where we put it, (i.e. must not be moved out of a loop as an optimization), put the keyword volatile after asm and before the ()'s. So to keep it from moving, deleting and all...

However, this contradicts the gcc manual, which clearly states that the volatile keyword on asm statements will not stop the compiler from moving the asm instructions, including across jump instructions (see Extended Asm in the GCC manual).

Clobbers memory

From Extended Asm in the GCC manual:

If your assembler instructions access memory in an unpredictable fashion, add `memory' to the list of clobbered registers. This will cause GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory. You will also want to add the volatile keyword if the memory affected is not listed in the inputs or outputs of the asm, as the `memory' clobber does not count as a side-effect of the asm.

In a 1996 mailing list post Re: Is clobber "memory" in include/asm-i386/system.h necessary?, Linus said:

Essentially, think of the "memory" thing as a serializing flag rather than as a "this instruction changes memory" flag. It is extremely important that gcc not re-order memory instructions across these serializing instructions, because otherwise you might find that gcc moves a memory load over the serializing instruction and then you lose...

Volatile keyword elsewhere

On data
On pointers

Survey of Existing Concurrency Primitives

Linux

The various memory fences are defined in arch/x86/include/asm/system.h as follows:

#define mb()    asm volatile("mfence":::"memory")
#define rmb()   asm volatile("lfence":::"memory")
#define wmb()   asm volatile("sfence" ::: "memory")

FreeBSD

The various memory fences are defined in amd64/include/atomic.h as follows:

#define mb()    __asm __volatile("mfence;" : : : "memory")
#define wmb()   __asm __volatile("sfence;" : : : "memory")
#define rmb()   __asm __volatile("lfence;" : : : "memory")

where __volatile is defined as volatile in sys/cdefs.h. I couldn't find the definition for __asm, but I assume it's the same as asm.

References

Extended Asm in the GCC manual describes how to use inline assembly in gcc. Unfortunately, it is not very precise or thorough. When describing the '+' constraint modifier, it says you should not use '+m', but Linus claims this is not true (see Re: [PATCH 3/8] i386: bitops: Rectify bogus "+m" constraints).

Sandeep.S's GCC-Inline-Assembly-HOWTO is a nice introduction to writing inline assembly in gcc. Unfortunately, it does not mention the + (plus) constraint modifier, and it is wrong about the semantics of asm volatile (see Warning box above).

http://kernel.org/doc/Documentation/volatile-considered-harmful.txt

A collection of posts by Linus on inline assembly in gcc that Norman Yarvin found interesting. The ones from 2007 are quite relevant to this document (some are also cited individually here).

A 2007 mailing list post Re: [PATCH 3/8] i386: bitops: Rectify bogus "+m" constraints by Linus which describes that "+m" as an asm output constraint is acceptable. This is in conflict with Extended Asm in the GCC manual, which Linus claims needs to be updated.

  • No labels