Page Comparison

...

Size is defined by a factor k, the number of ports per identical switch in the network
3-level heirarchy:
- core level ((k/2)^2 = k^2/4 switches)
  - each core switch uses all k ports to connect to k switches in first layer of the pod level
- pod level (k pods)
  - each pod has 2 internal layers with (k/2 switches/layer => k switches/pod)
  - upper level switches (k/2 of them) connect k/2 of their ports to core level switches
    - other k/2 ports connect to each of the k/2 lower pod level switches
  - lower level switches (k/2 of them) connect to k/2 hosts each
- end host level (k^3/4 total hosts)

k	# hosts	# switches	# ports	host:switch ratio	host:port ratio
4	16	20	80	0.8	0.2
8	128	80	640	1.6	0.2
16	1,024	320	5,120	3.2	0.2
32	8,192	1,280	40,960	6.4	0.2
48	27,648	2,880	138,240	9.6	0.2
64	65,536	5,120	327,680	12.8	0.2
96	221,184	11,520	1,105,920	19.2	0.2
128	524,288	20,480	2,621,440	25.6	0.2

Fat-tree will have no oversubscription if network resources can be properly exploited ("rearrangeably non-blocking")
- i.e. for a network of 1GigE switches, there will always be 1Gbit available between two arbitrary hosts if the interconnects between them can be properly scheduled
  - ways of handling this include recomputation of routes based on load, randomizing core switch hops, etc
- take away: can max out all ports, but only if we're smart
- Al-Fare's SIGCOMM '08 paper shows > 80% utilisation under worst-case conditions

...

...

If networking costs only small part of total DC cost, why is there oversubscription currently?
- it's possible to pay more and reduce oversubscription - cost doesn't seem the major factor
- but people argue that oversubscription leads to significant bottlenecks in real DCs
  - but, then, why aren't they reducing oversubscription from the get go?

Versions Compared

	Hierarchical	Fat-tree
# hosts	25,920	27,648
# switches	108 x 144-port 10GigE 1,296 x 20-port 1GigE w/ 2x10GigE uplinks	2,880 x 48-port 1GigE
# wires	57,024 (~91% GigE, ~9% 10GigE)	82,944
# unique paths	144 (36 via core with 2x dual uplinks in each subtree)	572