This page documents design decision made regarding the RPC API.
It was decided that we should support "Two phase send/wait synchronous RPCs" - a process sends an RPC, does other work, and then waits on the reply of the RPC.
Current RPC API:
It was decided to instantiate RPCObj objects for every RPC. The RPCObj object class will contain information pertaning to this RPC, such as address information of the sender and receiver, error codes, etc.
getReply throws an Exception if the RPC returned an error code.
On the server, we use a slightly different API. The server only has work to do when it receives an RPC, hence the API will be:
The getRequest call will block until an RPC request is received. If multiple threads have called getRequest at the same time, the driver thread can load balance between the threads. The server can also use client-like RPCObj objects to communicate with its Backup servers.
The RPCObj and RPCServerObj will deal with machine identification by using AddrType objects. This is so that we do not have to deal with naming and addressing issues when we switch from dev implementations (using UDP or TCP) and fastRPC implementations (own protocol?). The AddrType objects will expose methods to obtain the physical/hardware address of a particular machine given what type of RPC system is running.
Server Threading model
RAMCloud servers maintain a pool of worker threads. These threads call getRequest when they are ready to process new requests. When they are allocated a new RPC, they process only this RPC, and see it to completion. They send out RPC requests to the Backup servers, and wait for the replies before continuing. The threads do not process any other RPC requests while they are waiting.
There may be certain times when all the threads on a server are busy with other RPC requests, or waiting for replies from other machines. When this is the case, and a new RPC arrives, we may spawn a new worker thread to handle it.
Large RPC packets
The RPC sub system will handle large RPCs. That means the RPC size is only limited by the size of the largest object in RAMCloud (which was arbitrarily set at 1 MB for RAMCloud v1). This means that the layer underneath the RPC would handle RPC fragmentation and coalescing. Since ethernet frames are limited to 1.5K (or 16K for jumbo frames on 1 gig NICs), this means that the layer underneath the RPC system will split up large RPCs into multiple ethernet frames at the sender, and join these frames into one RPC packet on the receiver side.
Given that a transaction can consist of many object manipulations in a single RPC, we may need to set a limit for the RPC size that is larger than the maximum object size.