Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

This document specifies the protocol and packet formats used by RAMCloud clients
and servers to communicate with each other. All communication in RAMCloud
happens in the form of RPCs.

RAMCloud RPCs will use the standard Ethernet, IP and UDP headers without any optional
optional fields. On top of this, we they use the RAMCloud RPC header, specified
belowbelow.

Sessions and Channels

A session encapsulates the state of communication between a particular client
and a particular server. At the cost of a session open handshake (during which
the server authenticates the client and allocates state for the client's
session), sessions allow the client to open new channels for free. A channel is
a connection within an established Session on which a sequence of RPCs travel.

RPCs in a channel are performed sequentially. That is, only one RPC at a time
may be active on a particular channel. If a client wants to perform multiple
RPCs in parallel within a session, it must use multiple channels.

RAMCloud RPC Header format

The RAMCloud RPC header will consist consists of the following fields. The usage of these
fields are is explained below.

No Format

      <---------------32 bits -------------->
      +-------------------------------------+
      |            sessionToken             |
      +---------------------+--------+--------+
      |        sessionToken (cont.)  Connection ID      |
   Flags  | +--------+-------------+--------+--------+
      |                rpcId RPC ID              |
      +-------------------------------------+
      |         clientSessionHint           |
      +-------------------------------------+
      |         serverSessionHint           |
      +--------+-----------------------------+
      |     fragNumber Fragment ID  |  totalFrags   Total Fragments |
      +-----------------------------+--------+
      | channelId | flags |
      +-----------+--------+

...

The high four bits of the flags byte are the payloadType.

The low four bits of the flags byte are the following flags:
flags.direction is the first (lowest) bit
flags.requestAck is the second bit
flags.pleaseDrop is the third bit
flags.reserved1 is the fourth bit

Everything is encoded in little-endian, not network byte order.

Every packet sent over this protocol contains all of the above fields (but not
all of them are always relevant).

Connection ID

The ``Connection ID'' field identifies the connection that exists between a
pair of machines. RPC requests can travel in one direction only through a
connection. That is, machine 1 can do an RPC to machine 2 on one connection,
but a if machine 2 then wants to do an RPC to machine 1, it would need another,
different, Connection ID.

RPCs in a connection are performed sequentially. That is, only one RPC at a time
is performed using a particular connection. If a client wants to perform
multiple RPCs in parallel, it must open multiple connections.

RPC ID

The ``RPC ID'' field uniquely identifies an RPC belonging to a particular
connection. Together, the server, the ``Connection ID'' and the ``RPC ID''
uniquely identify any RPC in the RAMCloud system.

When a new connection is established, an RPC ID will be chosen by the client at
random. Every new RPC made on that connection will have an ID that is one
greater than the previous RPC ID.

Fragment ID

The ``Fragment ID'' field is used to identify a particular Ethernet frame sized
fragment belonging to an RPC. RPCs that are large enough to consist of multiple
Ethernet frames will thus have more than one fragment. The Fragment ID starts
at 0.

Total Fragments

This field specifies the total number of fragments that this RPC contains.

Flag Byte

Each bit of this byte represents a particular flag:

  1. ACK Request - a ck reply not piggybacked
  2. Request - this packet is a RPC request packet
  3. Reply - this packet is an RPC reply packet
  4. Control Bit - if set, th first byte of the payload is an opcode which specifies
    whether the packet is an ACK reply or an ERROR packet.

...

payloadType

There are currently four payload types defined. The rest are reserved.

PT_DATA = 0

A regular data fragment. The payload is a binary blob.

PT_ACK = 1

An acknowledgement response. The format for the payload is defined below.

PT_SESSION_OPEN = 2

A request to the server to open a new session or a response from the server for
such a request. The payload must be empty for a session open request (for now)
and is defined below for a session open response.

PT_BAD_SESSION = 4

A response from the server that the session specified is not valid. The payload
must be empty.

sessionToken

The session token serves to identify the session. It is large enough that it
can be generated randomly with a very low probability of collisions.

The server generates the session token upon receipt of a session open request
and sends it back in the sessionToken field of the session open response. The
headers for all subsequent packets on this new session must have sessionToken
set to this value.

TODO(ongaro): Is a session token assumed to be globally unique or only unique
to the client and server pair?

clientSessionHint

This value for this field is selected by the client and must be the same for
all packets on the session (including the session open request). Its value is
opaque to the server.

This may be used by the client, for example, to quickly find state for a
session upon receipt of a packet.

serverSessionHint

This field is analogous to clientSessionHint.

This value for this field is selected by the server and must be the same for
all packets on the session (except for the session open request but including
the session open response). Its value is opaque to the client.

This may be used by the server, for example, to quickly find state for a
session upon receipt of a packet.

channelId

The channel ID identifies the channel within the session and must be within the
bounds given by the server in the session open response.

rpcId

The RPC ID serves to ensure that old packets on a channel that is still valid
are ignored. The client must start the RPC ID with 0 on a new channel and must
increment it on every new RPC that it sends over the channel.

fragNumber

The fragment number serves to identify the fragment within the RPC request or
response, which may consist of multiple fragments. The numbering starts at 0.

totalFrags

The total number of fragments that make up the RPC request or response.

flags.direction

This flag is set when the server is sending a packet to the client.

This is useful for servers that also act as clients to easily distinguish
received packets intended for their server role as opposed to their client
role.

flags.requestAck

This flag is set when the sender wants to request an ACK from the receiver when
this fragment arrives. The fact that an ACK is being requested is conveyed with
a flag so that it can easily be piggy-backed with the transmission of a normal
data packet.

The sender might want an ACK for a fragment when: TODO(aravindn).

Request flag

The Request flag is set when the packet is part of an RPC request from the
client.

Reply flag

The Reply flag is when the packet is part of an RPC reply. That is, the server
sends the packet to the client as a reply for a RPC request that it received.

The Request and Reply flags are necessary because different code paths are
executed depending on whether a packet is part of a request or reply. These
flags help to identify that fact easily. TODO(aravindn): Explain clearly.

Control flag

When the control flag of a packet is set, the first byte of the payload will be
an opcode which specifies what kind of control packet it is.

Opcodes:
0x01 - ERROR packet
0x02 - ACK Reply

ERROR packet

This means something went wrong in the RPC system.

ACK Reply

When the sender requests an ACK, the receiver sends back an ``ACK reply
packet''. This packet has the following structure:

...

flags.pleaseDrop

This flag indicates that the sender wants the receiver to drop the packet on
arrival. It is used only for testing purposes to simulate errors in the
network.

flags.reserved1

This flag is reserved.

Session Open Response Payload Format

No Format

      <----8 bits---->
      +--------------+
      | maxChannelId |
      +--------------+

maxChannelId

The value of maxChannelId is the largest channel ID that the client may use for
the session, chosen at the server's discretion. That is, all channel IDs ever
used on the session must be less than or equal to this maxChannelId.

Acknowledgement Response Payload Format

No Format

      <---------------32 bits -------------->
                       +--------------------+
                       |  firstMissingFrag  |
      +-------------------------------------+
      |            stagingVector            |
      +-------------------------------------+

firstMissingFrag

The number before which all fragments have been received (between 0 and
totalFrags, inclusive). Note that the fragment whose number is firstMissingFrag
has not been received by definition.

stagingVector

A bit vector where the bit numbered i (counting from 0) corresponds to whether
the fragment whose number is firstMissingFrag + 1 + i has been received. Note
that the fragment whose number is firstMissingFrag has not been received by
definition.

...

Old:

Use cases:

Single packet request is lost

...