Potential Clusters for Experiment (WIP)
This page includes clusters that are potentially suitable for evaluating Homa Transport. To be more specific, we are particularly interested in low-latency Ethernet NIC and switch with (1) at least 8 data queues and (2) strict-priority (SP) queuing support for 802.1p tagged traffic.
Candidate 1: d430 nodes at Emulab Utah
Summary
- 160 nodes (4 racks x 40 nodes/rack) in total
- 1 dual-port or 1 quad-port Intel X710 10GbE PCI-Express NICs
- probably support strict priority on 802.1p tagged traffic: "High priority packets are processed before lower priority packets"[2]
- 1 Dell Z9500 132-port 40Gb switch (latency 600ns - 2us, 8 QoS data queues) connects all 10Gb interfaces
QoS Configuration
NIC
Switch
If we had complete control over the switch (which is not very likely because it connects all 160 nodes), using the global QoS configuration commands (documented in Chapter 44 of the Command-Line Reference Guide) would suffice. Otherwise, we have to further look into the policy-based QoS configuration (the overall paradigm/pipeline seems quite complex)?
Useful Links
[1] Intel X710 10GbE product brief: http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/ethernet-x710-brief.pdf
[2] User Guides for Intel Ethernet Adapters: https://downloadcenter.intel.com/download/11848/User-Guides-for-Intel-Ethernet-Adapters?product=75021
Utah Emulab hardware in general: https://wiki.emulab.net/wiki/Emulab/wiki/UtahHardware
Dell R430 (aka "d430") nodes: https://wiki.emulab.net/Emulab/wiki/d430
Emulab data plane layout: http://www.emulab.net/doc/docwrapper.php3?docname=topo.html
Dell Z9500 spec sheet: http://www.netsolutionworks.com/datasheets/Dell_Networking_Z9500_SpecSheet.pdf
Dell Z9500 Configuration Guide & Command-Line Reference Guide: http://www.dell.com/support/home/us/en/04/product-support/product/force10-z9500/manuals
Candidate 2: m510 nodes at CloudLab Utah
Summary
- 270 nodes (6 chassis x 45 nodes/chassis) in total
- Dual-port Mellanox ConnectX-3 10 Gb NIC (PCIe v3.0, 8 lanes)
- meaning? (btw, I couldn't find enough info about the number of queues or if it support SP queuing)
- 2 HP Moonshot 45XGc switches (45x10Gb ports, 4x40Gb ports for uplink to the core, latency < 1us, cut-through, 8 data queues) per chassis
- 1 large HP FlexFabric 12910 switch (96x40Gb ports, full bisection bandwidth internally, latency 6 - 16us) connecting all chassis
- They "have plans to enable some users to allocate entire chassis ... to have complete administrator control over the switches"
QoS Configuration
NIC
Switch
CloudLab Utah cluster hardware specs:
http://docs.cloudlab.us/hardware.html#%28part._cloudlab-utah%29
https://www.cloudlab.us/hardware.php#utah
ConnectX-3 Dual-Port 10 GbE Adapters w/ PCI Express 3.0: http://www.mellanox.com/page/products_dyn?product_family=127
HP Moonshot 45XGc switch QuickSpecs: https://www.hpe.com/h20195/v2/getpdf.aspx/c04384058.pdf
HP Moonshot-45XGc Switch Module Manuals: http://h20565.www2.hpe.com/portal/site/hpsc/public/psi/home/?sp4ts.oid=7398915#manuals
Candidate 3: ?
TODO
Verify Priority Actually Works
vconfig + iperf?
Miscellaneous
Relationship between CloudLab and Emulab/AptLab/etc.: https://www.cloudlab.us/cluster-graphs.php