Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Corrected links that should have been relative instead of absolute.

Network Substate

Random Concepts and Terminology

...

  • congestion results in:
    • packet loss if buffers overflow
    • else, increased latency from waiting in line
    • which is worse?
      • if RAMCloud fast enough, occasional packet loss may not be horrible
      • buffering may cause undesired latencies/variability in latency
  • even if no oversubscription, congestion is still an issue
    • e.g. any time multiple flows funnel from several ports to one port (within the network) or host (at the leaves)
      • conceivable in RAMCloud, as we expect to communicate with many different systems
        • e.g.: could be a problem if client issues enough sufficiently large requests to a large set of servers
  • UDP has no congestion control mechanisms
    • connectionless, unreliable protocol probably essential for latency and throughput goals
    • need to avoid congestion how?
      • rely on user to stagger queries/reduce parallelism? [c.f. Facebook]
      • if we're sufficiently fast, will we run into these problems anyhow?

"Data Center Ethernet"

  • Cisco: "collection of standardsbalaji's points: buffers don't scale with bandwidth increases
    • simply can't get 2x buffers with similar increase in bandwidth at high end
    • further, adding more bandwidth and keeping a reservation for temporary congestion is better than adding buffers
      • especially for RAMCloud - reduces latency
      • is this an argument against commodity (at least, against a pure commodity fat-tree)?
  • ECN - Explicit Congestion Notification
    • already done by switches - set bit in IP TOS header if nearing congestion, with greater probability as we approach saturation
    • mostly for sustained flow traffic
      • RAMCloud expects lots of small datagrams, rather than flows

"Data Center Ethernet"

  • Cisco: "collection of standards-based extensions to classical Ethernet that allows data center architects to create a data center transport layer that is:"
    • stable
    • lossless
    • efficient
  • Purpose is apparently to buck the trend of building multiple application-specific networks (IP, SAN, Infiniband, etc)
    • how? better multi-tenacy (traffic class isolation/prioritisation), guaranteed delivery (lossless transmission), layer-2 multipath (higher bisectional bandwidth) 
  • A series of additional standards:
    • "Class-based flow control" (CBFC)
      • for multi-tenancy
    • Enhanced transmission selection (ETS)
      • for multi-tenancy
    • Data center bridging exchange protocol (DCBCXP)
    • Lossless Ethernet
      • for guaranteed delivery
    • Congestion notification
      • end-to-end congestion management to avoid dropped frames (i.e. work around TCP congestion collapse, retrofit non-congestion-aware protocols to not cause trouble(question) )

In the field

  • Google
    • Use long-lived TCP connections
      • pre-established and left open to avoid handshake overhead
      • unclear how TCP has been tweaked for low-latency environment (retransmit timeouts, etc)

Misc. Thoughts

  • If networking costs only small part of total DC cost, why is there oversubscription currently?
    • it's possible to pay more and reduce oversubscription - cost doesn't seem the major factor
    • but people argue that oversubscription leads to significant bottlenecks in real DCs
      • but, then, why aren't they reducing oversubscription from the get go?