Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Purpose

We are evaluating four platforms for our proposed 40-node RAMCloud cluster. Each is very similar in price (approx. $2000-2500/node for 24-32GB of RAM, cpu, and a disk). Three machines are from SuperMicro. The fourth is a Dell box with nearly identical configuration to the Xeon E5620-based SuperMicro.

Machines

All machines are server-class hardware (i.e. support ECC) in 1U units. In the Xeon cases, we have the option of buying twin servers (two independent boards in one case) from SuperMicro. We're looking at the following configurations:

  • Xeon E5620 (4 core / 8 thread Westmere at 2.4GHz. Dual socket server. Max. 192GB RAM)
  • Xeon X3470 (4 core / 8 thread Nehalem at 2.93GHz. Single socket server. Max. 32GB RAM)
  • Opteron 6134 (8 core / 8 thread at 2.3GHz. Single socket server. Max. 128GB RAM)

Misc. Notes:

  • The E5620 and AMD 6134 have on-die memory controllers, but they do not have on-die PCIe. Both support 1GB superpages. The AMD chips do not have the CRC32 instruction yet.
  • The X3470 has both on-die memory and PCIe controllers. However, it is not a westmere chip, so it doesn't support 1GB superpages. It should have the CRC32 instruction.

CPUs

 

Xeon E5620

Opteron 6134

Xeon X3470

Clock

2.4GHz

2.3GHz

2.93GHz

Max Turbo

2.66GHz

N/A

3.6GHz

# Cores

4

8

4

# Threads

8

8

8

L1 Cache

64KB / core

128KB / core

64KB / core

L2 Cache

256KB / core

512KB / core

256KB / core

L3 Cache

12MB shared

12MB shared

8MB shared

On-die Memory Controller

Yes

Yes

Yes

On-die PCIe Controller

No

No

Yes

Max. CPU Sockets / Motherboard

2

? (>= 4)

1

Max. Memory Channels

3

4

2

Max. Memory Clock

1066MHz

1333MHz

1333MHz (NB: drops to 800MHz when >= 24GB RAM installed)

Max. Memory Supported

288GB

? (>=128GB)

32GB

1GB Superpages

Yes

Yes

No

CRC32 Instruction

Yes

No

Yes

 

 

 

 

Systems

 

Dell R410

SuperMicro 6016T-NTF (Xeon E5620)

SuperMicro 1012G-MTF (Opteron 6134)

SuperMicro 5016I-M6F

Motherboard

?

X8DTU-F

H8SGL-F

X8SI6-F

Chipset

Intel 5520 (Tylersburg) + ICH10R

Intel 5520 (Tylersburg) + ICH10R

AMD SR5650 + SP5100

Intel 3420 (Ibex Peak) + 3420 PCH

# CPU Sockets

2

2

1

1

# DIMM Slots

8 (4 / socket)

12 (6 / socket)

8

6

Max. Memory

128GB

192GB

128GB

32GB (board supports 48, cpu limits to 32)

# SATA Ports

6 (?)

6

6

6

# Hard Drives

4 x 3.5"

4 x 3.5"

4x 3.5"

4x 3.5"

# PCIe slots (electrical width)

1 (x16)

2 (x8)

1 (x16)

1 (x8)

Notes:

 

  1. Available in twin servers (two machines in one box). This halves the number of drive bays available to each machine, but reduces cost by ~$200/node.
  2. Alternate configurations are available that have 8x 2.5" drive bays, rather than 4x 3.5". These use an LSI Logic SAS controller. SAS drives are _expensive_.

 

  1. Motherboard has 1 x16 PCIe slot (straight to cpu) and 1 x4 slot through the PCH. There are different motherboard models from SuperMicrothat support the additional slots via risers.
  2. See the 6016T-NTF notes re: twin servers.

Evaluation Criteria

RAMCloud is concerned mostly with scale and latency. Since we have a limited budget and limited space, we cannot scale to a huge number of machines or load them with expensive high-density dimms (sweet spot looks like 4 or 8GB dimms). As such, most of our evaluation focuses on latency - infiniband network latency, ramcloud micro-benchmarks, and simple end-to-end ramcloud benchmarks. We will also consider miscellaneous features like number of pcie slots, dimm slots, cpu cores, etc.

RAMCloud Numbers:

Benchmark

Xeon E5620

Opteron 6134

Xeon X3470

Rabinpoly (min, max)

206MB/s, 381MB/s

57MB/s, 438MB/s

 

VMAC (min, max)

60MB/s, 2630MB/s

45MB/s, 2294MB/s

 

HashTableBench (-h 128)

84ns lookup, 98ns replace

147ns lookup, 133ns replace

 

1000-byte Bench (infiniband, localhost, -m 128 server)

7.6usec +/- 0.1usec RTT (272.76ns on server)

7.1usec +/- 0.3usec RTT (338.81ns on server)

 

100-byte Bench (infiniband, localhost, -m 128 server)

6.6usec RTT (283.43ns server)

6.2usec (348ns on server)

 

Infiniband Numbers (all are one-way, i.e. not RTT values, in microseconds):

Benchmark

Xeon E5620

Opteron 6134

Xeon 3470

Notes

ib_write_lat -s 128 localhost

1.40

1.16

 

 

ib_write_lat -s 1024 localhost

3.33

2.92

 

 

ib_send_lat -s 128 localhost

1.59

1.42

 

Sends are inlined with the WQE up to 400 bytes

ib_send_lat -s 1024 localhost

3.44

2.99

 

 

ib_send_lat -s 128 -I 0

2.38

2.04

 

-I 0 disables inlining data in the WQE

ib_send_lat -s 128 between E5620 and AMD 6134

1.58

1.58

N/A

With 2 AMD machines, est. is ~3.02us RTT

ib_send_lat -s 128 between two E5620 machines

1.65

N/A

N/A

~3.3us RTT

ib_send_lat -s 128 -I 0 between E5620 and AMD 6134

2.30

2.30

N/A

With 2 AMD machines, est. is ~4.24us RTT

ib_send_lat -s 128 -I 0 between two E5620 machines

2.48

N/A

N/A

~4.96us RTT

LM Bench Numbers

Benchmark

Xeon E5620

Opteron 6134

Xeon 3470

mem_bw (read / write / copy)

7800 / 3267 / 2095 MB/s

8300 / 7200 / 3870 MB/s

 

lat_mem_rd -N 1 -P 1 256M 512 (l1 / l2 / l3 / ram)

1.5 / 3.8 / 18 / 63 ns

1.3 / 6 / 19 / 109 ns

 

 

 

 

 

  • No labels