RAMCloud in a Box - ATOM based Micro Modular Server 'mmatom'
We have published initial RAMCloud sources to GitHub. We are planning to merge them to PlatformLab/RAMCloud/master. We are still making some trials, please forgive us for some glitches.
Contents)
Running RAMCloud
RAMCloud for ATOM server is cloned from GitHub. (Updated on Oct. 23, 2015)
Development and enhancement
clusterperf.py
TBD) recovery.py
Adding user with dedicated script
Hardware maintenance
NEC tools for ATOM server is cloned from GitHub
Security solution
Setup guide
Config files
Stanford's ATOM Cluster - configuration, etc
System photograph
Cluster Outline)
Basically the procedure is the similar to rccluster except that:
ATOM Server is moved from Stanford to NEC America on April 2016.
The cluster has 132 ATOM servers ( Total : 1,056 cores, 4.1TB DRAM, 16.5 TB SSD) connected to a management server 'mmatom.necam.com'.
Please take a look at System configuration below for detail.Constructing an ATOM server home page at http://mmatom.necam.com/
Unstable server can be disconnected with management tool, and the continuous IP-address/hostname given. IP-address and hostname is always associated. You will find the details in System-Administrator's guide:Hardware maintenance.
User's Guide)
User setup)
ssh login to management server 'mmatom.scs.stanford.edu' with public key authentification and the special SSH port.
Install your 'key pair for the cluster-login' to ~/.ssh . For existing RAMCloud users, it is already copied from your home directory.
Add the cluster-login-public key to ~/.ssh/authorized_keys .
Note)
- Your home is shared with all the ATOM servers with NFS, so you can login to all atom servers with public key authentification.
- Do not copy your private key for mmatom login. Please create different key pair to ssh to AROM cluster from mmatom.
- keypair can be generated with the command such as:$ ssh-keygen -t rsa -b 2048To avoid ssh Errors like 'Permission denied (publickey)':
Do not add passcode for the keypair for ATOM cluster login. You just type 'return' for passcode request of 'ssh-keygen'.No group/other access permission for ~/.ssh and ~/.ssh/autorhized_keys.0700 for ./ssh directory, 0600 for ./ssh/authorized_keysEach public_key in ~/.ssh/authorized_keys needs to be a single line without line break.
Initialize known_host:
You can use /usr/local/scripts/knownhost_init.sh
Usage) /usr/local/scripts/knownhost_init.sh <ServerNumberFrom> <ServerNumberTo>
If the host is already initialized result of 'hostname' on remote machine is displayed, otherwise you are prompted whether you will add the host to known_host database, where you should type 'yes'.Example)
$ knownhost_init.sh 1 20
atom001
atom002
:
Note) You may see the following error:
'Permission denied (publickey). '
If you follow 2.a, 2.b., it maybe because because the remote user information is not created on atomXXX. We do not use NIS or LDAP so far. You need ask your administrator to setup remote user on atomXXX. See : 'System administrator's guide' below.
Compile RAMCloud on the host 'mmatom' )
Run benchmark or application samples)
Preparation:
You need to have RAMCloud compiled.
Reserve ( Lease ) ATOM servers with /usr/local/bin/mmres
mmres has been ported from rcres. It manages resources for RAMCloud, eg. backup file, and DPDK resource file, etc. The backup replica is preserved until the mmres lease expires, so you can reuse the backup in different program while the lease continues.
Check usage with: $ mmres --help
Example)$ mmres ls -l$ mmres ls -l atom10-35 or $ mmres ls -l 10-35 // print range. You can use '..' instead of '-'.$ mmres lease 14:00 atom10-35 -m 'Comment here!!'Note) Added 'dx2k' cluster. dx2k stands for DX-2000 which is a new XeonD based micro modular server.
Try typing mmres ls -l dk2k
Note)
Use local time on the server. Please check the local time with 'date' command before trying lease. 'mmres' does not take care of user's timezone so far (python library's limitation....).
No need to edit scripts/config.py. The servers reserved are acquired from mmres in scripts/config.py.
Now you can spawn benchmarks or applications tasks to ATOM cluster from mmatom (management server) with python scripts.
The scripts are customized for the ATOM Cluster so that we can run RAMCloud tests with default option
Limitations) to be fixed...- A lot of DPDK debug messages starting with "EAL or PMD" is shown in stdout. Now fixing..Quick Hack: Run with /usr/local/bin/mmfilter wrapper:Usage: mmfilter command arguments...$ mmfilter scripts/clusterperf.py basic
Note: A shell script 'Run.sh' at the top level directory takes care of RAMCloud compile and run benchmarks. You only need to 'Run.sh' after 'mmres'. Please take a look into 'Run.sh' for details.
Run clusterperf.py) It's default parameter is equivalent to the , which is replica=3, server=4, backup=1
$ mmfilter scripts/clusterperf.py --transport=basic+dpdk [Tests][Tests]: Names of tests. Please take a look into clusterperf.py for available tests.
Limitation)- Problem running test 'indexBasic' or 'indexMultiple'. They are to slow and end with timeout. All other test runs OK.Run recovery.py)
$ mmfilter scripts/recovery.py -v --timeout=1000 --transport=basic+dpdk
Run clientSample) The simple client code.
$ cd clientSample
$ make
$ make run
or $mmfilter make run
Note) Now it successfully completes after original 1M iterations.