Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Woops - ticks, not usec.

...

  • RTT of 26 us for a simple ping client/server with 10 byte payload. 38 us for 100 bytes.
  • How is this achieved?
    • OS Bypass - exposed NIC functions directly to user level program
    • Proprietary protocol
    • Polling, instead of interrupts: Continually poll the NIC instead of it generating interrupts
    • Eliminate all copies on the server side
      • Process the packet while its still in the ring buffer.
      • This might need a large ring buffer, which might result in increased latency.
      • Solution: Multiple server threads processing in parallel.
      • Need locking mechanism -> Might increase overhead?
    • Using the GAMMA code as the base
    •  
  • RTT may be improved with some more NIC tuning
    • Claimed latency of 12-13 us with this mechanism.
    • Maybe use a doorbell register of some sort to reduce transmit latency further?
    •  
  • HiStar results (all using 100 byte frames [inc. ethernet header, minus CRC], kernel mode, interrupts disabled, no network stack, timed with TSC):
    • All numbers had very low variance - shared, lightly loaded 10/100/1000 copper switch
    • Intel e1000 gigabit nic (i82573E)
      • unclear if running 100mbit of 1000mbit mode - our switch lies, but phy claimed gigabit.
    • 36usec RTT kernel-to-kernel ping, polled, no interrupts
      • => user-mode drivers may have little overhead
    • transmit delay for 1 packet (time from putting on xmit ring to nic claiming xmit done):
      • 'xmit done' ill defined; docs seem to imply time to move buffer into xmit FIFO (as we configured the NIC)
      • IRQ assertion: 25-26usec
      • ring descriptor update: 23.5usec
      • => ring buffer update to IRQ assertion delay is ~2-3usec
    • transmit delay for n sequential packets:
      • 1: 23.5 usec 5k ticks (9.8 usec/pkt)
      • 2: 34.5 usec 5k ticks (7.2 usec/pkt)
      • 10: 136.5 usec 5k ticks (5.6usec/pkt)
    • => DMA engine latency in startup? could account for 30% of RTT overhead
      • NICs don't seem optimised for awesome latency when inactive
      • Lots of room for improvement if hardware designers had low latency concerns?

...