Node architecture

Goals

From the RAMCloud Wiki Home:

  • Commodity servers, 32-64GB/server (no custom chip or board design)
  • High throughput (i.e. 1M requests/second)

These goals are specified in terms of individual servers. However, we are essentially designing a cluster and should specify our goals in terms of the whole cluster.

That is, observing that

  1. RAMcloud should be scalable to 10,000's of servers
  2. End-user performance will be observed in terms of aggregate throughput

we should instead think in terms of total cluster storage capacity and total RAMcloud request throughput, rather than individual server capacity and throughput.

Revised goals:

  • 1PB of storage?
  • 1B requests/second?
  • Affordable/Realizable/Likely-to-see-the-day-of-light

We meet these goals by choosing appropriate hardware and replicating it as many times as necessary.

Questions

  • What metrics are individual nodes are optimized for?
    • Capacity-centric
      • GB / node? GB / RU?
      • $ / GB? (including chassis, etc.)
    • Performance-centric
      • transactions / second / node?
      • transactions / second / GB?
      • GB / GHz?
    • Energy-centric
      • Watts / GB?
      • Joules / request?
  • What form-factor should nodes take? 1U? 4U? Blade? Other?
  • How well are computational, storage, and I/O resources balanced on each node?
  • How dependable are nodes? Is ECC memory summarily required?
  • Should nodes be composed of commodity components or also incorporate exotic ones?

Exposition by Example

Dual-socket 1U server: Dell PowerEdge 1950 III

  • Dell PowerEdge 1950 III (configured on-line)
    • Dual-socket, dual-core Xeon E5205 @ 1.86 GHz
    • 1GB: $1349
    • 32GB: $2480 (8x 4GB), additional cost of $1131
    • 64GB: $9501 (8x 8GB), additional cost of $8152

64GB / node? Not bad! $9501 for the privilege? Yikes...

$148.45/GB @ 64GB
$77.50/GB @ 32GB

Clearly, don't maximize GB/node. $/GB is somewhat more useful.

Question: Are there practical limits on the total number of nodes that have to be considered?

Power consumption: 143W-244W http://www.spec.org/power_ssj2008/results/res2008q1/power_ssj2008-20080212-00035.html

7.625W/GB (peak power, 32GB)

Dual-socket 1U server: HP ProLiant DL165 G5
  • Dual-socket, quad-core Opteron 2344HE @ 1.7 GHz
  • 32GB: $2375 (8x 4G) = $74/GB
  • 64GB: $8375 (8x 8G) = $131/GB

Similar to PowerEdge 1950

Dual-socket 1U server: HP ProLiant DL165 G5p
  • Dual-socket, quad-core Opteron 2376 @ 2.3 GHz
  • 64GB: $4001 (16x 4G) = $62.50/GB
  • 128GB: $15681 (16x 8G) = $122.51/GB

300W peak? 4.765W/GB @ 64GB

Quad-socket 4U point of comparison: Dell PowerEdge R905

  • Dell PowerEdge R905
    • Quad-socket, quad-core Opteron 8346HE @ 1.8 GHz
    • 4GB: $4728
    • 128GB: $8090 (32x 4GB) = $63.20/GB
    • 256GB: $26,359 (32x 8GB) = $102.96/GB

Power consumption of similar system (HP DL580G5): 271W-387W http://www.spec.org/power_ssj2008/results/res2007q4/power_ssj2008-20071207-00024.html

3.023W/GB (peak power, 128GB)

Scale down: Dell PowerEdge R300

  • Dell PowerEdge R300
    • Single-socket, Core 2 Duo E6305 @ 1.86 GHz
    • 1GB: $699
    • 24GB: $1385 (6x 4GB DDR2)

24GB / node. Not as good of density, but $57.71/GB.

Note, however, this node has more GB/GHz than the PE 1950.

24 / (2x1.86) = 6.45GB/GHz
32 / (4x1.86) = 4.30GB/GHz

Question: Will server load scale with memory capacity in practice? Asked another way, does increased memory capacity tend to result in the deployment of fewer servers?

Power consumption: 75W-117W? http://www.spec.org/power_ssj2008/results/res2008q1/power_ssj2008-20080311-00042.txt

4.875W/GB (peak power, 24GB)

Scale way down... Quiet PC

  • Shuttle SP35P2V2 ($240 kit)
    • Single-socket, single-core Celeron 430 @ 1.8 GHz ($40)
    • 8GB: $348 (4x 2GB @ $17/ea.)

$43.5/GB!

8/1.86 = 4.44GB/GHz

Not traditionally rack-mountable. More Ethernet ports/GB!

Power consumption: 50W-83W?? http://www.bit-tech.net/hardware/cases/2008/01/03/shuttle_sn68ptg5/9

Question: What impact does low GB/node have on networking infrastructure?

10.375W/GB (peak power, 8GB)! Worst yet.

Exotic: PowerPC 460EX SoC

  • Specs
    • 1 GHz
    • On-chip DDR2 controller, supports four 4GB banks for 16GB total (really?)
    • On-chip SATA II
    • Two on-chip Gigabit Ethernet MACs

Question: Can you find these in the wild? How much do they cost?

Less Exotic: Marvell Sheeva (PXA168) SoC

  • Specs
    • 1.2 GHz ARM
    • On-chip DDR2 controller - 4GB theoretical limit, no clue what SoC supports
    • On-chip NAND, NOR flash controllers, 10/100 ethernet mac, PCIe bus, USB 2.0
    • purportedly superscalar - dual issue, OOO
  • SheevaPlug - ARM SoC in a wall wart
    • http://www.globalscaletechnologies.com/p-22-sheevaplug-dev-kit.aspx
    • PXA168 + gigabit phy, 512MB ram, 512MB flash
    • 5W power draw
    • includes AC adapter
    • dev kit: $99 USD in quantity of 1 (volume pricing undoubtedly much lower)
      • If a similar SoC could be had with 4GB for $150, we'd have:
        • $37.50/GB
        • 0.3GHz/GB of cpu (specious, I know)
        • 250Mbit/GB of network
        • 1.5-2W/GB guess based on 5W number above
      • Compared with "Followup, 32-slot OEM" below...
        • $39.73/GB
        • 0.225GHz/GB of cpu (for what it's worth)
        • 32Mbit/GB of network (assuming 4 gigE phys)
        • 1.7W/GB (taking into account only 4 x 55W processors; significantly higher in reality)
      • Summary:
        • similar $/GB, potentially similar CPU capacity/GB
        • better network throughput/GB with SoC
        • likely better W/GB with SoC
        • memory bandwidth/GB may be interesting
          • are 4x4 opterons 32x better to break even? probably
          • can a SoC saturate gigE?

Reliability Considerations

Is ECC necessary, sufficient, or neither? If we incorporate some other end-to-end data-integrity mechanism (i.e. store a CRC w/ data), is ECC redundant?

Other

Memory prices

Sampled somewhat randomly from buy.com, newegg.com, ewiz.com, crucial.com.

Capacity

Type

Cost

Cost/GB

2GB

DDR2 UDIMM

$17

$8.5/GB

2GB

DDR2 RDIMM

$33

$16.5/GB

2GB

DDR2 FBDIMM

$38

$19/GB

2GB

DDR3 UDIMM

$29

$14.5/GB

2GB

DDR3 RDIMM

$62

$31/GB

-

-

-

-

4GB

DDR2 RDIMM

$50

$12.5/GB

4GB

DDR2 FBDIMM

$72

$18/GB

4GB

DDR3 UDIMM

$490

 

4GB

DDR3 RDIMM

$110

$27.5/GB

-

-

-

-

8GB

DDR2 RDIMM (Meta)

$593

$74/GB

8GB

DDR2 FBDIMM

$690

$86.25/GB

8GB

DDR3 RDIMM

$1102

$137.75/GB

Like most things life, the cheapest option is to hug the commodity curve (Unregistered DDR2 DIMMs).

Other links

Follow-up, 32-slot OEM board

I finally found an OEM board with 32 DIMM slots. It's a four-socket Opteron 8300 platform and uses DDR2 registered DIMMs.

The board runs $949 (http://www.8anet.com/merchant.ihtml?pid=6863&step=4).

Since it requires Opteron 8300 chips, the cheapest processor I could find for it is the 1.8 GHz Opteron 8346HE, which runs $389 (http://www.esaitech.com/commerce/catalog/product.jsp?product_id=51069).

4GB DDR2 R-DIMMs run about $65 (http://www.ewiz.com/detail.php?name=D2667R4G4H).

One problem is that this motherboard is inconveniently large, (13"x17") so I'm having trouble finding cases that will fit it. I'll assume we can find one for $500.

So 949 + 4 * 389 + 32 * 65 + 500 = $5085

$39.73/GB

Note that the memory itself runs $16.25/GB, so we're spending $23.48/GB in overhead.