Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Table of Contents

This document specifies the protocol and packet formats used by RAMCloud clients
and servers to communicate with each other. All communication in RAMCloud
happens in the form of RPCs.

RAMCloud RPCs will use the standard Ethernet, IP and UDP headers without any optional
optional fields. On top of this, we they use the RAMCloud RPC header, specified
belowbelow.

Sessions and Channels

A session encapsulates the state of communication between a particular client
and a particular server. At the cost of a session open handshake (during which
the server authenticates the client and allocates state for the client's
session), sessions allow the client to open new channels for free. A channel is
a connection within an established Session on which a sequence of RPCs travel.

RPCs in a channel are performed sequentially. That is, only one RPC at a time
may be active on a particular channel. If a client wants to perform multiple
RPCs in parallel within a session, it must use multiple channels.

RAMCloud RPC Header format

The RAMCloud RPC header will consist consists of the following fields. The usage of these
fields are is explained below.

Code Blocknoformat

      <---------------32 bits -------------->
      +-------------------------------------+
      |            sessionToken             |
      +----------+-------------------+--------+
      |       Connection ID sessionToken (cont.)         |
 Flag    | +---------+------------+--------+--------+
      |               RPC rpcId  ID              |
      +-------------------------------------+
      |         clientSessionHint           |
      +-------------------------------------+
      |         serverSessionHint           |
      +--------------+-----------------------+
      |     fragNumber Fragment ID  |  totalFrags   Total Fragments |
      +-------------------+------------------+
      | channelId | flags |
      +--------+-----------+

Every packet of an RPC The high four bits of the flags byte are the payloadType.

The low four bits of the flags byte are the following flags:
flags.direction is the first (lowest) bit
flags.requestAck is the second bit
flags.pleaseDrop is the third bit
flags.reserved1 is the fourth bit

Everything is encoded in little-endian, not network byte order.

Every packet sent over this protocol contains all of the above fields . This is because every
packet is equally likely to be lost or dropped. Hence, if only the first packet
contained a field (for e.g., total fragments), and it was dropped, the receiver
would not know how many fragments to expect from the sender.

Connection ID

The ``Connection ID'' field identifies the connection that exists between a pair of
machines. This connection is one way only. That is, the connection identifies
the pipe that flows from machine 1 to machine 2. The pipe from machine 2 to
machine 1 is represented by another, different, Connection ID.

RPCs in a connection are performed sequentially. That is, only one RPC at a time
is performed using a particular connection. If a client wants to perform
multiple RPCs in parallel, it must open mutliple connections.

RPC ID

The ``RPC ID'' field uniquely identifies an RPC belonging to a particular
connection. Together, the ``Connection ID'' and the ``RPC ID'' uniquely identify
any RPC in the RAMCloud system.

When a new connection is established, an RPC ID will be chosen by the client at
random. Every new RPC made on that conneciton will have an ID that is one
greated than the previous RPC ID.

Fragment ID

The ``Fragment ID'' field is used to identify a particular ethernet frame sized
fragment belonging to an RPC. RPCs that are large enough to consist of mutliple
ethernet frames will thus have more than one fragment. The Fragment ID starts
at 0.

Total Fragments

This field specifies the total number of fragments that this RPC contains.

Flag Byte

Each bit of this byte represents a particular flag:

  1. ACK Request - a ck reply not piggybacked
  2. Request - this packet is a RPC request packet
  3. Reply - this packet is an RPC reply packet
  4. Control Bit - if set, th first byte of the payload is an opcode which specifies
    whether the packet is an ACK reply or an ERROR packet.

...

(but not
all of them are always relevant).

payloadType

There are currently four payload types defined. The rest are reserved.

PT_DATA = 0

A regular data fragment. The payload is a binary blob.

PT_ACK = 1

An acknowledgement response. The format for the payload is defined below.

PT_SESSION_OPEN = 2

A request to the server to open a new session or a response from the server for
such a request. The payload must be empty for a session open request (for now)
and is defined below for a session open response.

PT_BAD_SESSION = 4

A response from the server that the session specified is not valid. The payload
must be empty.

sessionToken

The session token serves to identify the session. It is large enough that it
can be generated randomly with a very low probability of collisions.

The server generates the session token upon receipt of a session open request
and sends it back in the sessionToken field of the session open response. The
headers for all subsequent packets on this new session must have sessionToken
set to this value.

TODO(ongaro): Is a session token assumed to be globally unique or only unique
to the client and server pair?

clientSessionHint

This value for this field is selected by the client and must be the same for
all packets on the session (including the session open request). Its value is
opaque to the server.

This may be used by the client, for example, to quickly find state for a
session upon receipt of a packet.

serverSessionHint

This field is analogous to clientSessionHint.

This value for this field is selected by the server and must be the same for
all packets on the session (except for the session open request but including
the session open response). Its value is opaque to the client.

This may be used by the server, for example, to quickly find state for a
session upon receipt of a packet.

channelId

The channel ID identifies the channel within the session and must be within the
bounds given by the server in the session open response.

rpcId

The RPC ID serves to ensure that old packets on a channel that is still valid
are ignored. The client must start the RPC ID with 0 on a new channel and must
increment it on every new RPC that it sends over the channel.

fragNumber

The fragment number serves to identify the fragment within the RPC request or
response, which may consist of multiple fragments. The numbering starts at 0.

totalFrags

The total number of fragments that make up the RPC request or response.

flags.direction

This flag is set when the server is sending a packet to the client.

This is useful for servers that also act as clients to easily distinguish
received packets intended for their server role as opposed to their client
role.

flags.requestAck

This flag is set when the sender wants to request an ACK from the receiver when
this fragment arrives. The fact that an ACK is being requested is conveyed with
a flag so that it can easily be piggy-backed with the transmission of a normal
data packet.

The sender might want an ACK for a fragment when: TODO(aravindn).

Request flag

The Request flag is set when the packet is part of an RPC request. That is, the
client sends the packet to the server requesting an RPC.

Reply flag

The Reply flag is when the packet is part of an RPC reply. That is, the server
sends the packet ot the client as a reply for a RPC request that it received.

The Request and Reply flags are necessary because different code paths are
executed depending on whether a packet is part of a request or reply. These
flags help to identify that fact easily. TODO(aravindn): Explain clearly.

Control flag

When the control flag of a packet is set, the first byte of the payload will be
an opcode which specifies what kind of control packet it is.

Opcodes:
0x01 - ERROR packet
0x02 - ACK Reply

ERROR packet

This means something went wrong in the RPC system.

ACK Reply

When the sender requests an ACK, the receiver sends back an ``ACK reply
packet''. This packet has the following structure:

  1. The control bit will be set.
  2. The opcode (first byte of the payload) will be 0x02.
  3. The rest of the payload will be a bit map representing the status of all the
    fragmetns of the payload. If a bit is 0, it means the corresponding fragment was
    not received, and if a bit is 1, it means the corresponding fragment was
    received with no errors.

The ACK reply flag has not been given its own bit in the Flag Byte because it
will not be piggy-backed on normal data packets.

Use cases:

...

flags.pleaseDrop

This flag indicates that the sender wants the receiver to drop the packet on
arrival. It is used only for testing purposes to simulate errors in the
network.

flags.reserved1

This flag is reserved.

Session Open Response Payload Format

No Format

      <----8 bits---->
      +--------------+
      | maxChannelId |
      +--------------+

maxChannelId

The value of maxChannelId is the largest channel ID that the client may use for
the session, chosen at the server's discretion. That is, all channel IDs ever
used on the session must be less than or equal to this maxChannelId.

Acknowledgement Response Payload Format

No Format

      <---------------32 bits -------------->
                       +--------------------+
                       |  firstMissingFrag  |
      +-------------------------------------+
      |            stagingVector            |
      +-------------------------------------+

firstMissingFrag

The number before which all fragments have been received (between 0 and
totalFrags, inclusive). Note that the fragment whose number is firstMissingFrag
has not been received by definition.

stagingVector

A bit vector where the bit numbered i (counting from 0) corresponds to whether
the fragment whose number is firstMissingFrag + 1 + i has been received. Note
that the fragment whose number is firstMissingFrag has not been received by
definition.

...

Old:

Use cases:

Single packet request is lost

When an RPC request that consists of only a single packet is lost, the client
will time out while waiting for the server to reply. When it times out, the
client will resend the entire request. The client will perform a fixed number
(X) of these resends before giving up and throwing an exception.

...

When an RPC response that consists of only a single packet is lost, the client
will time out, and resend the request just like in the previous case where a
single packet requset request is lost.

For each connetionconnection, the server maintains the most recent RPC response it
generated in memory.

When the resent request is received by the server, it will simply take the RPC
resposne response it has already comuted computed from its history list and send it back to the
client. Thus, it need not perform the computation required by the RPC once again.

...

When all fragments of a multi fragment response are lost, this case becomes
simliar similar to the single packet response is lost case, and the same steps are
followed.

When only some fragments are lost, the client times out and sends an ``ACK
Reply'' packet, even though it has not recevied received an ``ACK Request'' packet. When
the server receives this ``ACK Reply'' packet, it resends all the fragmens fragments that
were lost. The client will now have a full RPC response, as long as none of the
resent packets were lost as well.

...