Node architecture
Goals
From the RAMCloud Wiki Home:
Commodity servers, 32-64GB/server (no custom chip or board design)
High throughput (i.e. 1M requests/second)
These goals are specified in terms of individual servers. However, we are essentially designing a cluster and should specify our goals in terms of the whole cluster.
That is, observing that
RAMcloud should be scalable to 10,000's of servers
End-user performance will be observed in terms of aggregate throughput
we should instead think in terms of total cluster storage capacity and total RAMcloud request throughput, rather than individual server capacity and throughput.
Revised goals:
1PB of storage?
1B requests/second?
Affordable/Realizable/Likely-to-see-the-day-of-light
We meet these goals by choosing appropriate hardware and replicating it as many times as necessary.
Questions
What metrics are individual nodes are optimized for?
Capacity-centric
GB / node? GB / RU?
$ / GB? (including chassis, etc.)
Performance-centric
transactions / second / node?
transactions / second / GB?
GB / GHz?
Energy-centric
Watts / GB?
Joules / request?
What form-factor should nodes take? 1U? 4U? Blade? Other?
How well are computational, storage, and I/O resources balanced on each node?
How dependable are nodes? Is ECC memory summarily required?
Should nodes be composed of commodity components or also incorporate exotic ones?
Exposition by Example
Dual-socket 1U server: Dell PowerEdge 1950 III
Dell PowerEdge 1950 III (configured on-line)
Dual-socket, dual-core Xeon E5205 @ 1.86 GHz
1GB: $1349
32GB: $2480 (8x 4GB), additional cost of $1131
64GB: $9501 (8x 8GB), additional cost of $8152
64GB / node? Not bad! $9501 for the privilege? Yikes...
$148.45/GB @ 64GB
$77.50/GB @ 32GB
Clearly, don't maximize GB/node. $/GB is somewhat more useful.
Question: Are there practical limits on the total number of nodes that have to be considered?
Power consumption: 143W-244W http://www.spec.org/power_ssj2008/results/res2008q1/power_ssj2008-20080212-00035.html
7.625W/GB (peak power, 32GB)
Dual-socket 1U server: HP ProLiant DL165 G5
Dual-socket, quad-core Opteron 2344HE @ 1.7 GHz
32GB: $2375 (8x 4G) = $74/GB
64GB: $8375 (8x 8G) = $131/GB
Similar to PowerEdge 1950
Dual-socket 1U server: HP ProLiant DL165 G5p
Dual-socket, quad-core Opteron 2376 @ 2.3 GHz
64GB: $4001 (16x 4G) = $62.50/GB
128GB: $15681 (16x 8G) = $122.51/GB
300W peak? 4.765W/GB @ 64GB
Quad-socket 4U point of comparison: Dell PowerEdge R905
Dell PowerEdge R905
Quad-socket, quad-core Opteron 8346HE @ 1.8 GHz
4GB: $4728
128GB: $8090 (32x 4GB) = $63.20/GB
256GB: $26,359 (32x 8GB) = $102.96/GB
Power consumption of similar system (HP DL580G5): 271W-387W http://www.spec.org/power_ssj2008/results/res2007q4/power_ssj2008-20071207-00024.html
3.023W/GB (peak power, 128GB)
Scale down: Dell PowerEdge R300
Dell PowerEdge R300
Single-socket, Core 2 Duo E6305 @ 1.86 GHz
1GB: $699
24GB: $1385 (6x 4GB DDR2)
24GB / node. Not as good of density, but $57.71/GB.
Note, however, this node has more GB/GHz than the PE 1950.
24 / (2x1.86) = 6.45GB/GHz
32 / (4x1.86) = 4.30GB/GHz
Question: Will server load scale with memory capacity in practice? Asked another way, does increased memory capacity tend to result in the deployment of fewer servers?
Power consumption: 75W-117W? http://www.spec.org/power_ssj2008/results/res2008q1/power_ssj2008-20080311-00042.txt
4.875W/GB (peak power, 24GB)
Scale way down... Quiet PC
Shuttle SP35P2V2 ($240 kit)
Single-socket, single-core Celeron 430 @ 1.8 GHz ($40)
8GB: $348 (4x 2GB @ $17/ea.)
$43.5/GB!
8/1.86 = 4.44GB/GHz
Not traditionally rack-mountable. More Ethernet ports/GB!
Power consumption: 50W-83W?? http://www.bit-tech.net/hardware/cases/2008/01/03/shuttle_sn68ptg5/9
Question: What impact does low GB/node have on networking infrastructure?
10.375W/GB (peak power, 8GB)! Worst yet.
Exotic: PowerPC 460EX SoC
Specs
1 GHz
On-chip DDR2 controller, supports four 4GB banks for 16GB total (really?)
On-chip SATA II
Two on-chip Gigabit Ethernet MACs
Question: Can you find these in the wild? How much do they cost?
Less Exotic: Marvell Sheeva (PXA168) SoC
Specs
1.2 GHz ARM
On-chip DDR2 controller - 4GB theoretical limit, no clue what SoC supports
On-chip NAND, NOR flash controllers, 10/100 ethernet mac, PCIe bus, USB 2.0
purportedly superscalar - dual issue, OOO
SheevaPlug - ARM SoC in a wall wart
http://www.globalscaletechnologies.com/p-22-sheevaplug-dev-kit.aspx
PXA168 + gigabit phy, 512MB ram, 512MB flash
5W power draw
includes AC adapter
dev kit: $99 USD in quantity of 1 (volume pricing undoubtedly much lower)
If a similar SoC could be had with 4GB for $150, we'd have:
$37.50/GB
0.3GHz/GB of cpu (specious, I know)
250Mbit/GB of network
1.5-2W/GB guess based on 5W number above
Compared with "Followup, 32-slot OEM" below...
$39.73/GB
0.225GHz/GB of cpu (for what it's worth)
32Mbit/GB of network (assuming 4 gigE phys)
1.7W/GB (taking into account only 4 x 55W processors; significantly higher in reality)
Summary:
similar $/GB, potentially similar CPU capacity/GB
better network throughput/GB with SoC
likely better W/GB with SoC
memory bandwidth/GB may be interesting
are 4x4 opterons 32x better to break even? probably
can a SoC saturate gigE?
Reliability Considerations
Is ECC necessary, sufficient, or neither? If we incorporate some other end-to-end data-integrity mechanism (i.e. store a CRC w/ data), is ECC redundant?
Other
Memory prices
Sampled somewhat randomly from buy.com, newegg.com, ewiz.com, crucial.com.
Capacity | Type | Cost | Cost/GB |
|---|---|---|---|
2GB | DDR2 UDIMM | $17 | $8.5/GB |
2GB | DDR2 RDIMM | $33 | $16.5/GB |
2GB | DDR2 FBDIMM | $38 | $19/GB |
2GB | DDR3 UDIMM | $29 | $14.5/GB |
2GB | DDR3 RDIMM | $62 | $31/GB |
- | - | - | - |
4GB | DDR2 RDIMM | $50 | $12.5/GB |
4GB | DDR2 FBDIMM | $72 | $18/GB |
4GB | DDR3 UDIMM | $490 |
|
4GB | DDR3 RDIMM | $110 | $27.5/GB |
- | - | - | - |
8GB | DDR2 RDIMM (Meta) | $593 | $74/GB |
8GB | DDR2 FBDIMM | $690 | $86.25/GB |
8GB | DDR3 RDIMM | $1102 | $137.75/GB |
Like most things life, the cheapest option is to hug the commodity curve (Unregistered DDR2 DIMMs).
Other links
Tyan n6550EX (S4989-SI) 4-socket, 32 DIMM slot Opteron 8300 board, $885 http://www.8anet.com/merchant.ihtml?pid=6437&step=4
Tyan n6650EX (S4992) 4-socket, 32 DIMM slot Opteron 8300 board, $949 http://www.8anet.com/merchant.ihtml?pid=6863&step=4
Opteron 8346, $389 http://www.esaitech.com/commerce/catalog/product.jsp?product_id=51069
Follow-up, 32-slot OEM board
I finally found an OEM board with 32 DIMM slots. It's a four-socket Opteron 8300 platform and uses DDR2 registered DIMMs.
The board runs $949 (http://www.8anet.com/merchant.ihtml?pid=6863&step=4).
Since it requires Opteron 8300 chips, the cheapest processor I could find for it is the 1.8 GHz Opteron 8346HE, which runs $389 (http://www.esaitech.com/commerce/catalog/product.jsp?product_id=51069).
4GB DDR2 R-DIMMs run about $65 (http://www.ewiz.com/detail.php?name=D2667R4G4H).
One problem is that this motherboard is inconveniently large, (13"x17") so I'm having trouble finding cases that will fit it. I'll assume we can find one for $500.
So 949 + 4 * 389 + 32 * 65 + 500 = $5085
$39.73/GB
Note that the memory itself runs $16.25/GB, so we're spending $23.48/GB in overhead.