RPC Protocol

This document specifies the protocol and packet formats used by RAMCloud clients
and servers to communicate with each other. All communication in RAMCloud
happens in the form of RPCs.

RAMCloud RPCs use standard Ethernet, IP and UDP headers without any optional
fields. On top of this, they use the RAMCloud RPC header, specified below.

Sessions and Channels

A session encapsulates the state of communication between a particular client
and a particular server. At the cost of a session open handshake (during which
the server authenticates the client and allocates state for the client's
session), sessions allow the client to open new channels for free. A channel is
a connection within an established Session on which a sequence of RPCs travel.

RPCs in a channel are performed sequentially. That is, only one RPC at a time
may be active on a particular channel. If a client wants to perform multiple
RPCs in parallel within a session, it must use multiple channels.

RAMCloud RPC Header format

The RAMCloud RPC header consists of the following fields. The usage of these
fields is explained below.

      <---------------32 bits -------------->
      +-------------------------------------+
      |            sessionToken             |
      +-------------------------------------+
      |        sessionToken (cont.)         |
      +-------------------------------------+
      |                rpcId                |
      +-------------------------------------+
      |         clientSessionHint           |
      +-------------------------------------+
      |         serverSessionHint           |
      +-------------------------------------+
      |     fragNumber    |  totalFrags     |
      +-------------------------------------+
      | channelId | flags |
      +-------------------+

The high four bits of the flags byte are the payloadType.

The low four bits of the flags byte are the following flags:
flags.direction is the first (lowest) bit
flags.requestAck is the second bit
flags.pleaseDrop is the third bit
flags.reserved1 is the fourth bit

Everything is encoded in little-endian, not network byte order.

Every packet sent over this protocol contains all of the above fields (but not
all of them are always relevant).

payloadType

There are currently four payload types defined. The rest are reserved.

PT_DATA = 0

A regular data fragment. The payload is a binary blob.

PT_ACK = 1

An acknowledgement response. The format for the payload is defined below.

PT_SESSION_OPEN = 2

A request to the server to open a new session or a response from the server for
such a request. The payload must be empty for a session open request (for now)
and is defined below for a session open response.

PT_BAD_SESSION = 4

A response from the server that the session specified is not valid. The payload
must be empty.

sessionToken

The session token serves to identify the session. It is large enough that it
can be generated randomly with a very low probability of collisions.

The server generates the session token upon receipt of a session open request
and sends it back in the sessionToken field of the session open response. The
headers for all subsequent packets on this new session must have sessionToken
set to this value.

TODO(ongaro): Is a session token assumed to be globally unique or only unique
to the client and server pair?

clientSessionHint

This value for this field is selected by the client and must be the same for
all packets on the session (including the session open request). Its value is
opaque to the server.

This may be used by the client, for example, to quickly find state for a
session upon receipt of a packet.

serverSessionHint

This field is analogous to clientSessionHint.

This value for this field is selected by the server and must be the same for
all packets on the session (except for the session open request but including
the session open response). Its value is opaque to the client.

This may be used by the server, for example, to quickly find state for a
session upon receipt of a packet.

channelId

The channel ID identifies the channel within the session and must be within the
bounds given by the server in the session open response.

rpcId

The RPC ID serves to ensure that old packets on a channel that is still valid
are ignored. The client must start the RPC ID with 0 on a new channel and must
increment it on every new RPC that it sends over the channel.

fragNumber

The fragment number serves to identify the fragment within the RPC request or
response, which may consist of multiple fragments. The numbering starts at 0.

totalFrags

The total number of fragments that make up the RPC request or response.

flags.direction

This flag is set when the server is sending a packet to the client.

This is useful for servers that also act as clients to easily distinguish
received packets intended for their server role as opposed to their client
role.

flags.requestAck

This flag is set when the sender wants to request an ACK from the receiver when
this fragment arrives. The fact that an ACK is being requested is conveyed with
a flag so that it can easily be piggy-backed with the transmission of a normal
data packet.

flags.pleaseDrop

This flag indicates that the sender wants the receiver to drop the packet on
arrival. It is used only for testing purposes to simulate errors in the
network.

flags.reserved1

This flag is reserved.

Session Open Response Payload Format

      <----8 bits---->
      +--------------+
      | maxChannelId |
      +--------------+

maxChannelId

The value of maxChannelId is the largest channel ID that the client may use for
the session, chosen at the server's discretion. That is, all channel IDs ever
used on the session must be less than or equal to this maxChannelId.

Acknowledgement Response Payload Format

      <---------------32 bits -------------->
                       +--------------------+
                       |  firstMissingFrag  |
      +-------------------------------------+
      |            stagingVector            |
      +-------------------------------------+

firstMissingFrag

The number before which all fragments have been received (between 0 and
totalFrags, inclusive). Note that the fragment whose number is firstMissingFrag
has not been received by definition.

stagingVector

A bit vector where the bit numbered i (counting from 0) corresponds to whether
the fragment whose number is firstMissingFrag + 1 + i has been received. Note
that the fragment whose number is firstMissingFrag has not been received by
definition.


Old:

Use cases:

Single packet request is lost

When an RPC request that consists of only a single packet is lost, the client
will time out while waiting for the server to reply. When it times out, the
client will resend the entire request. The client will perform a fixed number
(X) of these resends before giving up and throwing an exception.

Single packet response is lost

When an RPC response that consists of only a single packet is lost, the client
will time out, and resend the request just like in the previous case where a
single packet request is lost.

For each connection, the server maintains the most recent RPC response it
generated in memory.

When the resent request is received by the server, it will simply take the RPC
response it has already computed from its history list and send it back to the
client. Thus, it need not perform the computation required by the RPC once again.

Multi-packet request is lost

When some (or all) fragments of a multi-packet request are lost, the following
happens.

The client times out while waiting for a reply from the server. It send
a packet with the ``Request ACK'' flag set. The payload of the packet will be
empty (TODO(aravindn): either the control bit is set, or the fragment id field
is set to one past the total number of fragments). On receipt of this packet,
the server will send back a packet with the ``Control bit'' set, and the opcode
equal to 0x02 ``ACK Reply''. The packet will contain a bitmap which details the
status of the fragments that the RPC consists of. The client resends the missing
fragments and waits for a reply from the server.

The above process is repeated a fixed (X) number of times, before the client
gives up and throws an exception.

Multi-packet response is lost

When all fragments of a multi fragment response are lost, this case becomes
similar to the single packet response is lost case, and the same steps are
followed.

When only some fragments are lost, the client times out and sends an ``ACK
Reply'' packet, even though it has not received an ``ACK Request'' packet. When
the server receives this ``ACK Reply'' packet, it resends all the fragments that
were lost. The client will now have a full RPC response, as long as none of the
resent packets were lost as well.

The above process is repeated a fixed (X) number of times, before the client
gives up and throws an exception.

Connection Setup

When machine 1 wants to open a connection to machine 2, it sends a packet with
the connection id set to 0, and the ``Request'' flag set. Machine 2 sends a
packet back to machine 1 with the connection id again set to 0, with the
``Reply'' flag set, and the new connection id in the first 3 bytes of the
payload.

Connections are automatically closed a machine when it detects that the
connection has been idle for a long time. Error packets are generated when
packets are received containing a connection id for one that has been closed.