RAMCloud in a Box - ATOM based Micro Modular Server 'mmatom'

Contents)

Basically the procedure is the similar to rccluster except that:

Use local git repository located on 'mmatom'
1. The source code and scripts are pretested for the ATOM based 'RAMCloud in a Box'.
The cluster has 132 ATOM servers ( Total : 1,056 cores, 4.1TB DRAM, 16.5 TB SSD) connected to dedicated management server 'mmatom.scs.stanford.edu', which is directly connected to the Internet. Please take a look at System configuration below for detail.
Broken or unstable server is disconnected with management tool, and the continuous IPaddress/hostname among servers is always maintained. IPaddress and hostname is always corresponding. The physical slot and chassis number can be identified in a management file on the management server.

User setup)

ssh login to management server 'mmatom.scs.stanford.edu' with public key authentification.
Install your 'key pair for the cluster-login' to ~/.ssh . For existing RAMCloud users they are already copied from your home in rcnfs.
- Add the cluster-login-public key to ~/.ssh/authrized_keys .
  Note)
  - Your home is shared with all the ATOM servers with NFS, so you can login to all atom servers with public key authentification.
  - Do not copy your private key for mmatom login. Please create different key pair to ssh to AROM cluster from mmatom.
Initialize known_host:
1. You can use /usr/local/scripts/knownhost_init.sh
  1. Usage) /usr/local/scripts/knownhost_init.sh <ServerNumberFrom> <ServerNumberTo>
    If the host is already initialized result of 'hostname' on remote machine is displayed, otherwise you are prompted whether you will add the host to known_host database, where you should type 'yes'.
  2. Example)
    $ knownhost_init.sh 1 20
    atom001
    atom002
    :

Compile RAMCloud on the host 'mmatom' )

Run clusterperf.py from 'mmatom') - You have RAMCloud source compiled.

Setting is defined in localconfig.py, config.py and imported to clusterperf.py through cluster.py.
Basic settings to run RAMCloud application on ATOM servers are provided in config.py . So far we have tested followings. We are going to test more commands:
- clusterperf.py (equivalent default parameter to rccluster: replica=3, server=4, backup=1)
- TBD: clusterperf.py (running all of the standard performance tests)
- TBD: recovery.py
Reserve ( Lease ) ATOM servers with /usr/local/bin/mmres
1. mmres is ported from rcres and resource management feature for DPDK is added.
2. check usage with: $mmres --help
Edit config.py for your servers reserved.
Run clusterperf.py
$ scripts/clusterperf.py

Note) We are now trying to move the disturbing DPDK log from standard out to some logfile.

Measured on Wed 25 Feb 2015 12:31:42 PM PST :

basic.read100         13.456 us     read single 100B object (30B key) median

basic.read100.min     12.423 us     read single 100B object (30B key) minimum

basic.read100.9       13.837 us     read single 100B object (30B key) 90%

basic.read100.99      17.698 us     read single 100B object (30B key) 99%

basic.read100.999     24.356 us     read single 100B object (30B key) 99.9%

basic.readBw100        4.7 MB/s   bandwidth reading 100B object (30B key)

basic.read1K          20.425 us     read single 1KB object (30B key) median

basic.read1K.min      19.593 us     read single 1KB object (30B key) minimum

basic.read1K.9        20.786 us     read single 1KB object (30B key) 90%

basic.read1K.99       24.306 us     read single 1KB object (30B key) 99%

basic.read1K.999      36.569 us     read single 1KB object (30B key) 99.9%

basic.readBw1K        33.4 MB/s   bandwidth reading 1KB object (30B key)

basic.read10K         52.461 us     read single 10KB object (30B key) median

basic.read10K.min     51.519 us     read single 10KB object (30B key) minimum

basic.read10K.9       53.294 us     read single 10KB object (30B key) 90%

basic.read10K.99      55.881 us     read single 10KB object (30B key) 99%

basic.read10K.999     86.142 us     read single 10KB object (30B key) 99.9%

basic.readBw10K      125.6 MB/s   bandwidth reading 10KB object (30B key)

basic.read100K       358.567 us     read single 100KB object (30B key) median

basic.read100K.min   356.611 us     read single 100KB object (30B key) minimum

basic.read100K.9     359.489 us     read single 100KB object (30B key) 90%

basic.read100K.99    381.799 us     read single 100KB object (30B key) 99%

basic.read100K.999    38.298 ms     read single 100KB object (30B key) 99.9%

basic.readBw100K     187.6 MB/s   bandwidth reading 100KB object (30B key)

Due to historical reason and considering future experiment, VLAN configuration is different in chassis
Management server is directly connect to the internet, the cluster is isolated from other Stanford servers.
1. Management server works as firewall, login server, firewall, NIS server, DHCP server, and PXE server for reconfiguring ATOM servers.

Reference: