RAMCloud needs to serialize data when it transfers messages or makes an RPC across machines. Protocol buffers, thrift etc, perform this function. They encode data in a standard format which can then be decoded on various target machines.

Protocol buffers and Thrift both define a description language, in which we can define our message format. Then, a compiler is used to generate bindings for various languages. These contain methods to serialize/deserialize the message, as well as other utility functions.

There are various advantages and disadvantages to using these libraries in RAMCloud:

Advantages:
1. We do not have to worry about different memory representations on different machines (little vs. big endian).
2. Versioning - We can add or remove fields to a message, and machines which do not understand this new field will simply ignore it.
3. Numbers are encoded using varint style encoding - saves space.

Disadvantages:
1. It takes a non-negligible amount of time to serialize and deserialize messages.
2. Size of the message may be slightly larger than normal if we are sending strings or blobs.

If we decide to have servers understand the data objects, PB or Thrift will be a good way to implement this, since we can create complex structures.

Performance measurements:

Measured on the Nehalem system.

Using a message roughly 100 bytes in size, with 2 integers and 3 strings.

Operation

Protocol Buffers

Thrift

Create a message object

0.717 us

0.695 us

Create and populate message with random data

4.87 us

5.42 us

Serialize one object

4.24 us

4.04 us

Deserialize one object

2.65 us

4.26 us

Using a message consisting of 8 uint64_t's

Operation

Protocol Buffers

Thrift

Create a message object

0.1694 us

0.66 us

Create and populate message with random data

1.16 us

1.32 us

Serialize one object

1.8 us

2.012 us

Deserialize one object

1.3 us

1.86 us

To see how much time a really simple serialization process would take, I measured how much time it would take to simply copy out numbers and strings without performing any encoding/decoding.

For the string based message, it took 338 ns to deserialize a message.
For the numbers based message, it took only 250 ns to deserialize the message.

Two cache misses are experienced for every iteration. Both the source and the target buffer miss in the cache. I allocate a huge chunk of memory (1 GB), and then copy randomly from/to that.