This page is obsolete. Please visit newer page from HERE.

Three chassis of NEC's ATOM servers have arrived in April 2014. This page includes the information about these setup and initial performance evaluation.

Table of Contents)

RAMCloud in a Box (Prototype 1) at Stanford
1. SEDCL Forum 2014 Presentation: "RAMCloud in a Box": pdf
2. Sever Overview:
  1. Three chassis of NEC's Micro Modular Server (44 ATOM servers in 2U chassis) : 132 ATOM servers : announced May 2014
3. Management Server:
Presentations:
1. Poster session at SEDCL retreat June 2014 - RAMCloud performance with userland driver on ATOM server
2. Poster session at SEDCL forum January 2014 - Overview of ATOM server
Current Status
Setup
1. Terminologies
2. Server Hardware Setup
3. rcmaster (host) Setup
4. Install and boot CentOS on ATOM Servers
5. Rebuilding RAMCloud and run it on ATOM Servers
Performance evaluation:
1. Ping with kernel/tcp
2. RAMCloud performance:
  1. Clusterperf performance with kernel/tcp (tuned)
  2. Clusterperf with userland (kernel bypass driver) (Tuning is still going on)
References:

1. Server Overview)

The sever announcement
1. NEC America: http://www.nec-enterprise.com/News/Latest-press-releases/NEC-raises-the-bar-for-high-density-IT-solution-platforms-for-the-public-and-private-cloud-698
2. Article: http://www.otcmarkets.com/news/otc-market-headline?id=16150276
3. ASCII.jp (In Japanese, 日本語): http://ascii.jp/elem/000/000/895/895712/
(Photo) Three chassis installed in Stanford Server Room:　(two chassis installed on top of existing rcmaster rack)

2. Presentations)

SEDCL Retreat Poster Session on June 5, 2014:
1. pdf: satoshi_poster.pdf
2. Draft - multiple pages, easier to see)　rev1.01 on June 4, 2014
  1. pdf: 20140605-RAMCloudOnMicroServer_r1_01s.pdf
  2. ppt: 20140605-RAMCloudOnMicroServer_r1_01s.ppt
  3. Mac keynote source: 20140605-RAMCloudOnMicroServer_r1_01s.key.zip (zip compressed)
  4. pdf with appendix: 20140605-RAMCloudOnMicroServer_r1_01.pdf
Poster session at SEDCL forum on January 2014 - Overview of ATOM server ... Included in the introduction of June 5, 2014 presentation.

5. Current Status)

At Stanford)
- 88 Atom servers are running on CentOS 6.5.
- To setup, NIS, NFS mount, DNS server to ssh login
- To compile RAMCloud and provide python scripts for testing RAMCloud.
At NEC Japan)
- Performance improvement)
  1. Without replica/backup) - Single thread mode.
    1. 100B read (30B key): 12.4 us (min. 11.6 us) (7.3MB/s) : was 13.8us (6.9MB/s) at SEDCL retreat presentation.
    2. 100B write (30B key): 15.3 us (6.2MB/s) : was 18.2us (5.2MB/s) at SEDCL retreat presentation.
    3. Note) Spent 8.7 us in Intel igb driver. Contacting Intel for investigation and improvement.
  2. With replica/backup) - Need to enable multithread to respond backup request to collocated backup service while responding write request on master.
    1. 100B read (30B key): 13.2 us (min. 12.6 us) (7.2MB/s) vs. 5.1us (18.7MB/s( RAMCloud with 32Gbps Infiniband (See: clusterperf August 12, 2013 )
    2. 100B write (30B key): 38.0 us (2.5MB/s) vs. 15.7us (6.1MB/s) vs. RAMCloud with 32Gbps Infiniband
    3. The same performance both on 1.7GHz ATOM Avoton (C2730) and 2.4GHz ATOM (C2750) - Time spent in L2 cache/main memory access or network, etc not running at core clock.
- Further evaluation
  - Preparing performance evaluation on the latest Xeon machine on 10G NIC with the same Kernel bypass driver.
- Additional host for remote maintenance and upgrading RAMCloud.

4. Setup)

Terminologies)

Management components:
1. One instance in each server blade:
  1. MMC: each server's BMC (Base Management Controller) which controls power, boot, etc
2. For Chassis control:
  1. CMM: a BMC which controls the chassis function. Two instances in a chassis. One as master, one as slave (slave).
  2. ONS: a control port for each of two switch boards in a chassis. Two instances for two network cards. We can access ONS with ether two microUSB ports on the front panel.

Server HW Setup)

Setup Document: 20140806-ATOMServerSetup_r1_00.pdf
IPaddress assignment:
1. Host IP assignment list – with information about chassis, slot, MAC address
2. Raw Data) Mac addresses and slot information.
Configuration scripts on rcmaster:
1. Note)
  1. We strongly recommend to use hostname instead of IP-address because physical configuration since IP-address assignment would be changed anytime. IPMItool accepts hostname or IP-address.
  2. IPMItool password is read from '~/.atom/ipmi_password.txt'. You should restrict the access permission of the file to limited administrators.
2. AllCheck.sh <CMM-Master> // Acquires the 'Mac address and slot information' in above item2.
  1. try.sh // try ping and AllCheck.sh for a range of IP-addresses.
3. NoVLAN.sh <ether_CMM> // First step to merge VLAN=4092 to VLAN=1 to all the MMCs in the LAN managed by CMM. It needs to be run before reconfigure ONS, otherwise we become unable to talk to MMCs on the LAN.
4. ipmiaw <range_of_servers> <ipmi_blobs> : ipmi atom wrapper : you can use 1,2,3-10 to specify range of MMC of atom servers. Dry run, check generated command with '-d' for the first paramer. 'Range' is extended from original ipmirw:
  1. range elements:
    1. M : Single element
    2. M-N : M to N
    3. M+D : M to M+D-1 (D elements)
  2. Range element can be ether:
    1. Hostname: atom002, or number
    2. IP-address: 192.168.5.3, If upper 24 bit is omitted like '.2', default subnet for ATOM IPMI is used.
  3. Concatenation of range elements by , (Note: No space in the list!!)
    1. Eg.) 1,5-7,10+3,20,25-27,50+2,60 Try) ipmiaw -d 1,5-7,10+3,20 foo bar
    2. Eg) .1,5-7,10+3 or 192.168.5.1+44 Try) ipmiaw -d .1,5-7,10+3,20 foo bar
5. The tools below calls ipmiaw wrarpper:
  1. pxeboot.sh <range_of_servers> // Run PXE boot at next boot. Once a PXE boot is performed, the boot image is automatically saved in the server's local SSD and next boot is booted from local copy.
  2. atom_up.sh <range_of_servers> // power up range of servers
  3. atom_down.sh <range_of_servers> // power down range of servers
  4. atom_boot.sh <range_of_servers> // boot OS on the range of servers.
    // Special key sequence: `~.` to quit, `~^z` to suspend. With ssh login, '~' needs to be escaped with `~', so `~~.' to quit.
Configuration commands through serial terminal connected to ONS through microUSB port in TeraTerm command script format.
1. noVLAN.ttl // Second step of VLAN reconfiguration to ONS. Including initONS.ttl sequence.
2. initONS.ttl // Initialize ONS, serial terminal command sequence to reconfigure the LAN switch managed by the ONS.
3. lan40G.ttl // Reconfigure external switch port to 40Gx1 LAN from default 10Gx4 LAN mode. Effective for the LAN controlled by the ONS.
4. lan10G.ttl // vise versa.
For power up/down, activate (boot OS), see the latter half of section " Install and boot CentOS on ATOM servers)".

　 References:

SerialPort Accesss
- Serial terminal software on Mac: http://pbxbook.com/other/mac-tty.html
- microUSB-serialPort driver: http: //ww.silabs.com/products/mcu/pages/usbtouartbridgevcpdrivers.aspx
- TeraTerm: https://www.ayera.com/teraterm/
IMPItool command reference:
- http://sclcdc.net/wp-content/uploads/2013/12/IPMI-Command-Document.pdf
- (Japanese, 日本語 for an NEC product??) http://sourceforge.jp/projects/nec-imodel/wiki/ipmitool_manpage
Previous hardware setup document used for preparing power line and server room space: 20140409-Server-Network-r2_07.pdf

rcmaster Setup)

1. DHCP setup

hostname assignment proposal)
Precaution) As we have IP-address holes 'xx.xx.xx.0 and xx.xx.xx.255', it is not easy to associate numbers included in host name to some numbers in IP-address. I think we should hide IP-address and use hostname instead to improve flexibility to modification of system configuration.

1. Kernel driver ports, which normally used for host communication, connected through NIC1 (eth0 : 'InternalMAC1' in the ATOM Server Mac Address table):

	chassis1 (root)	chassis2 (leaf1)	chassis3 (leaf2)
eth0 (NIC1) port	atom001 to 044	atom045 to 088	atom089 to 132

2. MMC (BIOS port for server management: 'MAC Addresss' in the ATOM Server Mac Address table):

	chassis1 (root)	chassis2 (leaf1)	chassis3 (leaf2)
MMC port	atom001m to 044m	atom045m to 088m	atom089m to 132m

3. Userland driver ports (eth1 : 'InternalMAC2' in the ATOM Server Mac Address table)

Assign no IP-address to avoid loop, which is OK because the ports are only with L2 (MAC address).

4. Control ports for each chassis. We refer chassis with atom server's development code name 'mercury'.

	chassis1 (root)	chassis2 (leaf1)	chassis3 (leaf2)
CMM(Master)　port	mercury1cmm	mercury2cmm	mercury3cmm
CMM(Slave) port	mercury1cms	mercury2cms	mercury3cms
ONS(Master) port	mercury1onm	mercury2onm	mercury3onm
ONS(Slave) port	mercury1ons	mercury2ons	mercury3ons

5. If we modify userland driver to use L3 for future extension, we can name them with suffix from 'a'. We think it is enough to reserve 'a' to 'e'. It does not go through 'm' which is assigned for MMC.

	chassis1 (root)	chassis2 (leaf1)	chassis3 (leaf2)
eth1 (NIC2) port	atom001a to 044a	atom045a to 088a	atom089a to 132a
eth2 (NIC3) ... if extended	atom001b to 044b	atom045b to 088b	atom089b to 132b
:	:	:	:

1.1 Using Multiple Subnet)

Current ramcloud cluster has 80 servers. We are going to assign host/DHCP to subnets as follows:

Both OS port and IPMIport of RAMCloud servers including rcmaster, rcnfs, rctest, rcmonster

	subnet	name	number of IP addresses	usage
curent	192.168.0/24	rc**	176	Both OS port and IPMI port on each server
curent	192.168.1/24	infiniband	88	IP address for Infiniband
curent	192.168.2/24	inf eth	88	10G Ethernet on Infiniband card
new	192.168.3/24	atom***	132	OS port (eth0) of ATOM server
new	192.168.4/24	atom***a	132	Eth1 of ATOM server (for future use)
new	192.168.5/25	atom management	132+12	MMC of servers, CMMs and ONS of chassis

2. Setup tftp server for PXE boot)

Host OS version: Red Hat Enterprise Linux Server release 6.0 (Santiago): 2.6.32-71.el6.x86_64
PXE boot image is located as follows (see http://wiki.centos.org/HowTos/PXE/PXE_Setup )
- For each "Release" and "ARCH" Copy vmlinuz and initrd.img from /images/pxeboot/ directory on "disc 1" of that $Release/$ARCH to /tftpboot/images/centos/$ARCH/$RELEASE
References)
- http://geekpeek.net/pxe-server-centos-6/
- CentOS site)
  - http://wiki.centos.org/HowTos/PXE/PXE_Setup
  - http://wiki.centos.org/HowTos/NetworkInstallServer

Install and boot CentOS on ATOM servers)

	ramcloud cluster	ATOM cluster
OS	RHEL 6.0 v2.6.3.2-71	CentOS 6.5 v2.6.3.2
gcc	4.4.7	4.4.7

Creating or downloading CentOS image
Partitioning SSD
1. Block device '/dev/sda2' is used for RAMCloud backup space. More than 100GB space expected.
Locating boot image for PXE
1. Base directory is /tftpboot/images/centos/x86_64/6.5
2. Modification needed
  1. Disable SELinux:
    # vi /etc/selinux/config
    SELINUX=disabled
    # setenforce 0
  2. Update igb driver to 5.1.2 from igb-5.0.5-k :
    ＃ cd ..../source/igb/igb/src
    ＃ cp igb.ko /lib/modules/2.6.32-431.el6.x86_64/kernel/drivers/net/igb/igb.ko
    ＃ rmmod igb
    ＃ modprobe igb RSS=8 InterruptThrottleRate=1
Insert two 200V power cables to each chassis. (No power switch on ATOM server chassis).
Enable PXE boot on the respective ATOM sever with IPMItool ...
If this procedure is skipped and any PXE boot is performed before, OS is booted with the server’s local copy.
$ pxeboot.sh <atomXXXm> // See ’Server HW Setup)' --> 'Configuration Scripts' for the command reference.
Power up and boot the respective server with IPMItool: >> See: While Chassis startup/shutdown shortcut.
$ ipmitool -I lanplus -U <admin_user> -P <password> -H atomXXXm power on
Can connect OS console on a server with ipmi or ssh.
$ ipmitool -I lanplus -U <admin_user> -P <password> -H atomXXXm sol activate
Note) Special ipmitool key sequence: '~.` to exit, `~^z` to suspend. If you are logged in with ssh, you need to escape '~' with '~', so type '~~.' to exit.

System Shutdown) - Skip the procedure if the corresponding power up sequence is not executed.

Shutdown OS
$ ssh root@atomXXXm shutdown -h now
Power down each server
$ ipmitool -I lanplus -U <admin_user> -P <password> -H atomXXXm power off
Remove the power cables
Sample script is provided in '<ConfigurationScriptDirectory>/PowerCtrlExample/{up, down, nec*}.sh (nec*.sh for activation).

Short Cut for whole chassis nodes)

Start all servers in the chassis)
$ ipmiaw mercury?cmm power on // Boot whole severs by sending 'power on' to the CMM of the chassis.
Shutdown and power off all servers in the chassis)
$ipmiaw mercury?cmm power soft // power soft waits the shutdown of OS. Sending 'power soft' to CMM initiates OS shutdown of all servers in the chassis, wait the OS shutdown and shut down the power of servers. Do not use 'power off' while OS is running, it initiates OS shutdown but it force power down in four seconds regardless of OS status (It is used to force power down.).

Build RAMCloud and run it)

Allocate a dedicated workspace for ATOM cluster on rcmaster
1. It will be merged to existing ramcloud work tree, while the source tree is merged to existing ramcloud source repository.
Compiling RAMCloud for ATOM server
Compiling DPDK module
1. link or Insmod
Run clusterperf.py

References:

Configuration manual (In Japanese, 日本語): RAMCloud用試作NIC機能仕様書（557-6SX3Y4-001）.xlsx

5. Performance evaluation)

Results on May 27, 2014. Summarized in SEDCL retreat presentations on June, 2014.
Still working and improving to 11.5us with 100B read (on Aug. 5, 2014).

Peak Performance Calculation Sheet: ATOMServerPeakPerformances.xls
1. Clusterperf.py 100B read with kernel tcp – 67.8 us
with userland (kernel bypass) driver – tentative through 1 hop LAN switch (FM5224 chip)
1. ping : 7 us
2. clusterperf.py (lists in next section) : average and best/worst in 100ms trial period. (7,000 samples in 100B read)
  1. with cut through switch mode.
  2. with store and forward switch mode - almost the same as cut through mode.

Clusterperf.py with cut through switch mode.

basic.read100 13.8 us, Best 13.3 us , Worst 32.2 us
6.9 MB/s bandwidth reading 100B object with 30B key
basic.read1K 17.9 us, Best 17.3 us , Worst 29.0 us
53.4 MB/s bandwidth reading 1KB object with 30B key
basic.read10K 48.6 us, Best 47.8 us, Worst 55.9 us
196.4 MB/s bandwidth reading 10KB object with 30B key
basic.read100K 369.0 us , Best 367.3 us , Worst 376.1 us
258.4 MB/s bandwidth reading 100KB object with 30B key
basic.read1M 3.8 ms, Best 3.8 ms , Worst 3.8 ms
251.4 MB/s bandwidth reading 1MB object with 30B key

basic.write100 18.1 us, Best 17.4 us, Worst 35.2 us
5.3 MB/s bandwidth writing 100B object with 30B key
basic.write1K 22.7 us, Best 21.8 us, Worst 120.8 us
42.0 MB/s bandwidth writing 1KB object with 30B key
basic.write10K 60.1 us, Best 58.2 us,Worst 100.3 us
158.6 MB/s bandwidth writing 10KB object with 30B key
basic.write100K 428.3 us, Best 418.9 us, Worst 470.9 us
222.7 MB/s bandwidth writing 100KB object with 30B key
basic.write1M 4.6 ms, Best 4.5 ms , Worst 4.7 ms
206.8 MB/s bandwidth writing 1MB object with 30B key

Clusterperf.py with store and forward switch mode.

basic.read100 13.8 us, Best 13.3 us, Worst 32.7 us
6.9 MB/s bandwidth reading 100B object with 30B key
basic.read1K 20.7 us, Best 20.0 us, Worst 37.7 us
46.1 MB/s bandwidth reading 1KB object with 30B key
basic.read10K 52.8 us, Best 52.1 us, Worst 68.6 us
180.8 MB/s bandwidth reading 10KB object with 30B key
basic.read100K 373.2 us, Best 371.3 us Worst 379.0 us
255.5 MB/s bandwidth reading 100KB object with 30B key
basic.read1M 3.9 ms, Best 3.8 ms, Worst 3.9 ms
247.2 MB/s bandwidth reading 1MB object with 30B key

basic.write100 18.2 us, Best 17.4 us, Worst 43.6 us
5.2 MB/s bandwidth writing 100B object with 30B key
basic.write1K 25.6 us, Best 24.7 us, Worst 64.1 us
37.2 MB/s bandwidth writing 1KB object with 30B key
basic.write10K 64.2 us, Best 62.5 us, Worst 95.5 us
148.6 MB/s bandwidth writing 10KB object with 30B key
basic.write100K 431.4 us, Best 423.2 us, Worst 463.0 us
221.0 MB/s bandwidth writing 100KB object with 30B key
basic.write1M 4.7 ms, Best 4.6 ms, Worst 4.8 ms
204.6 MB/s bandwidth writing 1MB object with 30B key

Comparison: Clusterperf.py with 32Gbps Infiniband on Aug 12, 2012

basic.read100          5.1 us     read single 100B object with 30B key
basic.readBw100       18.7 MB/s   bandwidth reading 100B object with 30B key
basic.read1K           6.9 us     read single 1KB object with 30B key
basic.readBw1K       137.6 MB/s   bandwidth reading 1KB object with 30B key
basic.read10K         10.4 us     read single 10KB object with 30B key
basic.readBw10K      914.1 MB/s   bandwidth reading 10KB object with 30B key
basic.read100K        47.2 us     read single 100KB object with 30B key
basic.readBw100K       2.0 GB/s   bandwidth reading 100KB object with 30B key
basic.read1M         420.8 us     read single 1MB object with 30B key
basic.readBw1M         2.2 GB/s   bandwidth reading 1MB object with 30B key

Performance comparison with clusterperf.py)

RAMCloud has ported to ATOM server in the middle of April 2014.

Environment and difference)

Using same Linux kernel and gcc.

	ramcloud cluster	ATOM cluster
OS	RHEL 6.0 v2.6.3.2-71	CentOS 6.5 v2.6.3.2
gcc	4.4.7	4.4.7

Initial performance evaluation on tcp)

Updated on May 22, 2014, user mode driver is under development for ATOM servers.
Compiled with 'make DEBUG=no'

Reference: clusterperf on TCP (about --disjunct option and process/port assignment of clusterperf+tcp)

	transport	replica	size (B)	read (us)	write (us)	clusterperf.py option
ramcloud cluster	tcp	no	100	25.1	25.1	--verbose --transport=tcp --clients=1 --servers=1 --numBackups=0 --replicas=0 --disjunct basic
	tcp	no	100K	103	145
	tcp	1	100	24.0	102.1	--verbose --transport=tcp --clients=1 --servers=2 --numBackups=1 --replicas=1 --disjunct basic
	InfRc	no	100	5.2	6.2	--verbose --clients=1 --servers=1 --numBackups=0 -- replicas=0 --disjunct basic
ATOM cluster	tcp	no	100	70		--verbose --transport=tcp --server=1 --client=1 --numBackups=0 --replicas=0 --disjunct basic
ATOM cluster	tcp	no	100K

Ping Comparison) with kernel TCP driver

	ping (us)	condition
ramcloud cluster	220	ping to rc01 on rcmaster
ATOM cluster	120 to 150

6. Reference)

Existing switch 1G x 48 port switch: HP ProCurve 2510G-48: http://h17007.www1.hp.com/us/en/networking/products/switches/HP_2510_Switch_Series/index.aspx#.U-OXKYBdUqk
New switch 1G x port: HP 2920-24G: http://www8.hp.com/h20195/v2/GetDocument.aspx?docname=c04111401
Information about ATOM NIC (Intel Avoton C2xxx) and TOR board (FM5224 chip) in the chassis: SF13_CLDS006_101.pdf
(Downloaded from Intel IDF2013 presentation: https://intel.activeevents.com/sf13/connect/fileDownload/session/A02B7458AF93EB0153BB728308E30F99/SF13_CLDS006_101.pdf

RAMCloud Project

ATOM Cluster