Node architecture
Goals
From the RAMCloud Wiki Home:
- Commodity servers, 32-64GB/server (no custom chip or board design)
- High throughput (i.e. 1M requests/second)
These goals are specified in terms of individual servers. However, we are essentially designing a cluster and should specify our goals in terms of the whole cluster.
That is, observing that
- RAMcloud should be scalable to 10,000's of servers
- End-user performance will be observed in terms of aggregate throughput
we should instead think in terms of total cluster storage capacity and total RAMcloud request throughput, rather than individual server capacity and throughput.
Revised goals:
- 1PB of storage?
- 1B requests/second?
- Affordable/Realizable/Likely-to-see-the-day-of-light
We meet these goals by choosing appropriate hardware and replicating it as many times as necessary.
Questions
- What metrics are individual nodes are optimized for?
- Capacity-centric
- GB / node? GB / RU?
- $ / GB? (including chassis, etc.)
- Performance-centric
- transactions / second / node?
- transactions / second / GB?
- GB / GHz?
- Energy-centric
- Watts / GB?
- Joules / request?
- Capacity-centric
- What form-factor should nodes take? 1U? 4U? Blade? Other?
- How well are computational, storage, and I/O resources balanced on each node?
- How dependable are nodes? Is ECC memory summarily required?
- Should nodes be composed of commodity components or also incorporate exotic ones?
Exposition by Example
Dual-socket 1U server: Dell PowerEdge 1950 III
- Dell PowerEdge 1950 III (configured on-line)
- Dual-socket, dual-core Xeon E5205 @ 1.86 GHz
- 1GB: $1349
- 32GB: $2480 (8x 4GB), additional cost of $1131
- 64GB: $9501 (8x 8GB), additional cost of $8152
64GB / node? Not bad! $9501 for the privilege? Yikes...
$148.45/GB @ 64GB
$77.50/GB @ 32GB
Clearly, don't maximize GB/node. $/GB is somewhat more useful.
Question: Are there practical limits on the total number of nodes that have to be considered?
Power consumption: 143W-244W http://www.spec.org/power_ssj2008/results/res2008q1/power_ssj2008-20080212-00035.html
7.625W/GB (peak power, 32GB)
Dual-socket 1U server: HP ProLiant DL165 G5
- Dual-socket, quad-core Opteron 2344HE @ 1.7 GHz
- 32GB: $2375 (8x 4G) = $74/GB
- 64GB: $8375 (8x 8G) = $131/GB
Similar to PowerEdge 1950
Dual-socket 1U server: HP ProLiant DL165 G5p
- Dual-socket, quad-core Opteron 2376 @ 2.3 GHz
- 64GB: $4001 (16x 4G) = $62.50/GB
- 128GB: $15681 (16x 8G) = $122.51/GB
300W peak? 4.765W/GB @ 64GB
Quad-socket 4U point of comparison: Dell PowerEdge R905
- Dell PowerEdge R905
- Quad-socket, quad-core Opteron 8346HE @ 1.8 GHz
- 4GB: $4728
- 128GB: $8090 (32x 4GB) = $63.20/GB
- 256GB: $26,359 (32x 8GB) = $102.96/GB
Power consumption of similar system (HP DL580G5): 271W-387W http://www.spec.org/power_ssj2008/results/res2007q4/power_ssj2008-20071207-00024.html
3.023W/GB (peak power, 128GB)
Scale down: Dell PowerEdge R300
- Dell PowerEdge R300
- Single-socket, Core 2 Duo E6305 @ 1.86 GHz
- 1GB: $699
- 24GB: $1385 (6x 4GB DDR2)
24GB / node. Not as good of density, but $57.71/GB.
Note, however, this node has more GB/GHz than the PE 1950.
24 / (2x1.86) = 6.45GB/GHz
32 / (4x1.86) = 4.30GB/GHz
Question: Will server load scale with memory capacity in practice? Asked another way, does increased memory capacity tend to result in the deployment of fewer servers?
Power consumption: 75W-117W? http://www.spec.org/power_ssj2008/results/res2008q1/power_ssj2008-20080311-00042.txt
4.875W/GB (peak power, 24GB)
Scale way down... Quiet PC
- Shuttle SP35P2V2 ($240 kit)
- Single-socket, single-core Celeron 430 @ 1.8 GHz ($40)
- 8GB: $348 (4x 2GB @ $17/ea.)
$43.5/GB!
8/1.86 = 4.44GB/GHz
Not traditionally rack-mountable. More Ethernet ports/GB!
Power consumption: 50W-83W?? http://www.bit-tech.net/hardware/cases/2008/01/03/shuttle_sn68ptg5/9
Question: What impact does low GB/node have on networking infrastructure?
10.375W/GB (peak power, 8GB)! Worst yet.
Exotic: PowerPC 460EX SoC
- Specs
- 1 GHz
- On-chip DDR2 controller, supports four 4GB banks for 16GB total (really?)
- On-chip SATA II
- Two on-chip Gigabit Ethernet MACs
Question: Can you find these in the wild? How much do they cost?
Less Exotic: Marvell Sheeva (PXA168) SoC
- Specs
- 1.2 GHz ARM
- On-chip DDR2 controller - 4GB theoretical limit, no clue what SoC supports
- On-chip NAND, NOR flash controllers, 10/100 ethernet mac, PCIe bus, USB 2.0
- purportedly superscalar - dual issue, OOO
- SheevaPlug - ARM SoC in a wall wart
- http://www.globalscaletechnologies.com/p-22-sheevaplug-dev-kit.aspx
- PXA168 + gigabit phy, 512MB ram, 512MB flash
- 5W power draw
- includes AC adapter
- dev kit: $99 USD in quantity of 1 (volume pricing undoubtedly much lower)
- If a similar SoC could be had with 4GB for $150, we'd have:
- $37.50/GB
- 0.3GHz/GB of cpu (specious, I know)
- 250Mbit/GB of network
- 1.5-2W/GB guess based on 5W number above
- Compared with "Followup, 32-slot OEM" below...
- $39.73/GB
- 0.225GHz/GB of cpu (for what it's worth)
- 32Mbit/GB of network (assuming 4 gigE phys)
- 1.7W/GB (taking into account only 4 x 55W processors; significantly higher in reality)
- Summary:
- similar $/GB, potentially similar CPU capacity/GB
- better network throughput/GB with SoC
- likely better W/GB with SoC
- memory bandwidth/GB may be interesting
- are 4x4 opterons 32x better to break even? probably
- can a SoC saturate gigE?
- If a similar SoC could be had with 4GB for $150, we'd have:
Reliability Considerations
Is ECC necessary, sufficient, or neither? If we incorporate some other end-to-end data-integrity mechanism (i.e. store a CRC w/ data), is ECC redundant?
Other
Memory prices
Sampled somewhat randomly from buy.com, newegg.com, ewiz.com, crucial.com.
Capacity |
Type |
Cost |
Cost/GB |
---|---|---|---|
2GB |
DDR2 UDIMM |
$17 |
$8.5/GB |
2GB |
DDR2 RDIMM |
$33 |
$16.5/GB |
2GB |
DDR2 FBDIMM |
$38 |
$19/GB |
2GB |
DDR3 UDIMM |
$29 |
$14.5/GB |
2GB |
DDR3 RDIMM |
$62 |
$31/GB |
- |
- |
- |
- |
4GB |
DDR2 RDIMM |
$50 |
$12.5/GB |
4GB |
DDR2 FBDIMM |
$72 |
$18/GB |
4GB |
DDR3 UDIMM |
$490 |
|
4GB |
DDR3 RDIMM |
$110 |
$27.5/GB |
- |
- |
- |
- |
8GB |
DDR2 RDIMM (Meta) |
$593 |
$74/GB |
8GB |
DDR2 FBDIMM |
$690 |
$86.25/GB |
8GB |
DDR3 RDIMM |
$1102 |
$137.75/GB |
Like most things life, the cheapest option is to hug the commodity curve (Unregistered DDR2 DIMMs).
Other links
- Tyan n6550EX (S4989-SI) 4-socket, 32 DIMM slot Opteron 8300 board, $885 http://www.8anet.com/merchant.ihtml?pid=6437&step=4
- Tyan n6650EX (S4992) 4-socket, 32 DIMM slot Opteron 8300 board, $949 http://www.8anet.com/merchant.ihtml?pid=6863&step=4
- ftp://ftp.tyan.com/img_mobo/S4992_2D.jpg
- http://www.tyan.com/catalog/TYAN_AMD_2009_Q1_DM.pdf
- Opteron 8346, $389 http://www.esaitech.com/commerce/catalog/product.jsp?product_id=51069
Follow-up, 32-slot OEM board
I finally found an OEM board with 32 DIMM slots. It's a four-socket Opteron 8300 platform and uses DDR2 registered DIMMs.
The board runs $949 (http://www.8anet.com/merchant.ihtml?pid=6863&step=4).
Since it requires Opteron 8300 chips, the cheapest processor I could find for it is the 1.8 GHz Opteron 8346HE, which runs $389 (http://www.esaitech.com/commerce/catalog/product.jsp?product_id=51069).
4GB DDR2 R-DIMMs run about $65 (http://www.ewiz.com/detail.php?name=D2667R4G4H).
One problem is that this motherboard is inconveniently large, (13"x17") so I'm having trouble finding cases that will fit it. I'll assume we can find one for $500.
So 949 + 4 * 389 + 32 * 65 + 500 = $5085
$39.73/GB
Note that the memory itself runs $16.25/GB, so we're spending $23.48/GB in overhead.