Random Notes on Solar Flare

Ring Buffer:
--The receiver descriptor ring holds 512 packet buffers and configurable using
EF_RXQ_SIZE. The larger, will reduce the possbility of packet drops but also
reduces the efficiency.
--The transmit descriptor ring holds 512 packet buffers and configurable using

EF_VI Notes:

A layer 2 API send/receive raw ethernet frames, provides direct access to the
Solar Flare datapath, delivers the lowest latency. Supports zero-copy interface
because provides direct access to the memory buffers but the it's unreliable and
users should provide higher level protocols.

compiling notes under the file README.ef_vi

Using EF_VI:
allocate a virtual interface (ef_vi type) that includes: a receve descriptor
ring , a transmit descriptor ring, and event queue for notifications from
To transmit packet: application writes packet content into packet buffer, calls
ef_vi_transmit(), descriptors that describe packet will be queued in TX ring,
doorbell is rung to inform adapter that transmit ring non-empty.

To receive packet: descriptors are queued in receive ring (refer to
ef_vi_receive_init() and _post()), packets will be written into the buffers.

Even queue is channel form hardware to software that notifies the software when
there is packet arrived, transmit has been completed (so that buffer can be
reused or freed). events are retreived by calling ef_eventq_poll()

buffer used for packets must be pinned so that they can't be paged, also need to
be registered for DMA with network card. type ef_iobufset encapsulate the buffer
set that is suitable for DMA. type ef_addr is special address space to identify
these buffers.

Protection Domain: a collection of memory regions and VIs that are tied to a
user interface. memory regions can be assigned to different prot. domains which
is usefule for zero copying (how?) function ef_pd_alloc()

Virtual Interface: consists of TX ring, RX ring, and event queue. can have all
of them or some of them. function ef_vin_alloc_from_pd()

Memory Region: ef_memreg_alloc() registers memory to a VI within a prot. domain.
Performance will be improved when memory is set or memory is aligned to 4MB
boundaries. First need to allocate the desired chunk of memory, then pass it to
this function. This function takes to driver handles, one for the driver you
want to allocate memory into, and second is the pd you want allocate memory
from. ef_memreg_dma_addr() finds the dma address of the memory for use by
ef_vi_receive_init() and ef_memreg_free() frees the memory.

Filters: vi can have multiple filters set, they can be removed by cookies, error
happens if you set a filter that already exists. filters should first be
initialized, then must be set, then must be added to the vi. Specific order that
adapter checks for filters is: TCP/UDP (local and remote IP), lister
socket(local ip and port but any remote ip/port), destination mac address/vlan
tag, anything else. default filter sends anything to the kernel. In most of
adapter models, filters are per VI and can't be setup for multiple VIs (excepts
in 7xxx models). Note, we can specify filters based on vlan tags but careful, if
you also have ip/port filter, the vlan tags filters are ignored.
ef_filter_spec_init() prepares a blank filter, then there are bunch of different
function to setup a proper filter like ef_filter_spec_set_{ip4_full, ip4_local,
eth_local, vlan, unicast_all, multicast_all}() functions. refer to ef_vi manual
for complete explanation. lastely, ef_vi_filter_add() must be called to add the
filter to the adapter.

ef_driver_open(), fills the argument with an opaque handle needed
for other method calls.
ef_pd_alloc() allocates pd that spedifies how memory should be allocated for our
VI. Use if_nametoindex to find nic index(eg. eth1). flags EF_PD_DEFAULT,
EF_PD_VF, and EF_PD_PHYS_MODE passed in to this function and determine the
addressing type that pd should use (refer to the note for Adapter Address).
ef_vi_alloc_from_pd(), allocates a vi that is allocates RX ring, TX ring, event
queue, timer setup and fills out the opaque structure that is needed to access
them in software. The ring sizes can be specified here, or by env var
EF_VI_RXQ_SIZE and EF_VI_TX_SIZe and if none present, the default size will be

512. flags, there are bunch of them can be used for ignoring checksum checking,
only sending tcp/udp frames and not raw frames, etc. refer to the manual for all
of them.
ef_vi_free(), ef_pd_free(), ef_drive_close() should be called when you are

Receive Packet Buffer: call function ef_vi_receive_prefix_len() to find the
length of the meta data prefix in each packet and call
ef_vi_receive_buffer_len(). The basic path for receiving, push empty packets to
RX ring, and poll the event queue to see if they are ready and handle them.
ef_vi_receive_post() wraps two functions ef_vi_receive_init() which gets in the
DMA address of the memory buffer, retreived from ef_memreg_dma_addr(), to
receive packets. the id is arbitrary and is used to keep track of your buffers
and function ef_vi_receive_push() which pushes some packets into RX ring. This
function is relatively slow, and ideally must be called to submit packets in
batches and by a thread other than your critical path (what is the overhead if
we use second thread?). small batch keeps the ring more full but large batch
size is more efficient, good values must be 8, 16, or 32. Then we should call
ef_eventq_poll() function that pulls out the events from the event queue. the
events correspond to transmitted packets completion or arrived packets. This
function is the absolute latency critical function so it should be called as
often as possible. you can specify the max events that this function will return
but this doesn't wait for that so if no more event, it just returns immidiately.
EF_EVENT_TYPE identifies the event type. Type EF_EVENT_TYPE_RX for receiving,
EF_EVENT_TYPE_RX_DISCARD indicates packet arrived but has bad checksum, not
addressed here, or other errors. Note: unicast packets that are sent to a mac
address other than the interface mac, are always tagged as DISCARD. the type
EF_RX_DISCARD_OTHER is for unicast packets that are sent to mac other than the
interface mac and type EF_RX_DISCARD_MISMATCH is for multicast packets to a
group of host that our machine is not currently subscribed to. EF_EVENT_RX_BYTES
lets you to know how many bytes are arrived in packet. RX_PKT_PTR returns a
pointer to the start of the payload including the headers.
EF_EVENT_RX_Q_ID/EF_EVENT_RX_DISCARD_RQ_ID return the packet ID as specified
when ef_vi_receive_init was called.

Adapter Address Space: 3 modes are possible, buffer table, SR-IOV, and no
translation. buffer table is a piece of memory that keeps the mapping from the
buffer ID to the physical address of the buffer. no matter how many NICs you
have, there's 120000 buffers in the system (why?) SR-IOV uses IOMMU so it lifts
the 120k limitationm env var EF_VI_PD_FLAGS=vf. no translation, adapter can
read directly from physical addresses, env var EF_VI_PD_FLAGS=phys. These flags
are passed to function ef_pd_alloc().

Transmitting Packets: The packets memory buffers must be registered with a
protection domain. If transmit queue is empty when doorbell is rung, then
TX_PUSH is used so can cause ef_vi poll for the events to check if any packet
has been transmitted (latency vs throughput trade off). The regular path is like
this: construct packets with correct headers, post packets on transmit ring by
calling ef_vi_transmit_init() (should pass in the DMA address of the buffers
which can be retrieved by calling ef_memreg_dma_addr() ), ring doorbell by
calling ef_vi_transmit_push() so that the card send them. so fill your packet,
submit it for transmission. Then poll the event queue to find out when the
transmission is complete, and reclaim the packet buffer for re­use. The event
type for transmit are EF_EVENT_TYPE_TX/EF_EVENT_TYPE_TX_ERROR that indicate the
transmit has completed or failed and EF_EVENT_TX_Q_ID can be used to find the id
of the packet that has been sent. After that you can call
ef_vi_transmit_unbundle() to remove the packet buffer from the TX ring to reuse
it or to free it or put it back in the transmit ring.

Event queue status: ef_vi_receive_space() returns the free space in RX rin,
ef_vi_receive_fill_level() returns number of filled packets in RX ring,
ef_vi_receive_capacity() returns total size of the ring. You always want to keep
the fill_level high. Similar functions are available for transmit ring but the
fill level is usually zero unless you are transmitting a large number of

Notes on handling events: receive descriptors should be posted to receiver ring
in multiples of 8 (why?) if you push 10, ef_vi will push 8 and if you push less
than 8, ef_vi will ignore. If rx ring is empty, and you push less than 8
descriptors ot rx ring before blocking on event queue, the application will
remain blocked since there is no descriptor to receive to nothing get posted to
the event queue.
The batch size for polling should be greater than batch size for refilling to
detect when the queue is going empty. (what does this mean?)
The adapter is cut through, errors are delivered along with packet, the software
should detect the errors and recycle the associated buffers.
Two functions handle_poll() and refill_rx_ring() from Onload User guid are

related to this topic. look at them form more details.