Page Comparison

...

We have published initial RAMCloud sources to GitHub. We are planning to merge them to PlatformLab/RAMCloud/master. We are still making some trials, please forgive us for some glitches.

Anchor

	atomTOC
	atomTOC

Contents)

Cluster Outline
User's Guide
clusterperf.
1. User setupSetup
2. Running RAMCloud
3. Development and enhancement
Performance result:
1. 1. Compile RAMCloud
    1. RAMCloud for ATOM server is cloned from GitHub. (Updated on Oct. 23, 2015)
  2. Run benchmark
  3. Compile and run applications
2. Development and enhancement
Performance result
- clusterperf.py
- TBD) recovery.py
System administrator's guide
1. Adding user with dedicated script.
2. Hardware maintenance
  1. NEC tools for ATOM server is cloned from GitHub
3. Security solution
4. For management server setup
  1. Setup guide
  2. Config files
5. Stanford's ATOM Cluster - configuration, etc
  1. System photograph
  2. Server module specification
  3. Network connection

Cluster Outline)
Anchor
clusterOutline
clusterOutline

Basically the procedure is the similar to rccluster except that:

NOTE) I have created a repository for test in /home/satoshi/ramcloud-dpdk.git and tested. You can clone the latest ramcloud on ATOM for testing. Please try it. The instruction is written below.ATOM Server is moved from Stanford to NEC America on April 2016.
The cluster has 132 ATOM servers ( Total : 1,056 cores, 4.1TB DRAM, 16.5 TB SSD) connected to dedicated a management server 'mmatom.scsnecam.stanford.edu', which is directly connected to the Internet. com'.
Please take a look at System configuration below for detail.
Unstable Constructing an ATOM server home page at http://mmatom.necam.com/
Unstable server can be disconnected with management tool, and the continuous IP-address/hostname given. IP-address and hostname is always associated. You will find the details in System-Administrator's guide:Hardware maintenance.

return to Table of Contents.

User's Guide)
Anchor
userGuide
userGuide

User setup)

Anchor

	userSetup
	userSetup

ssh login to management server 'mmatom.scs.stanford.edu' with public key authentification and the special SSH port.
Install your 'key pair for the cluster-login' to ~/.ssh . For existing RAMCloud users, it is already copied from your home directory.
- Add the cluster-login-public key to ~/.ssh/authorized_keys .
  Note)
  - Your home is shared with all the ATOM servers with NFS, so you can login to all atom servers with public key authentification.
  - Do not copy your private key for mmatom login. Please create different key pair to ssh to AROM cluster from mmatom.
  - keypair can be generated with the command such as:
```
         $ ssh-keygen -t rsa -b 2048
```
- To avoid ssh Errors like 'Permission denied (publickey)':
  1. ```
  No group/other access permission for ~/.ssh and ~/.ssh/Do not add passcode for the keypair for ATOM cluster login. You just type 'return' for passcode request of 'ssh-keygen'.
```
2. ```
No group/other access permission for ~/.ssh and ~/.ssh/autorhized_keys.
0700 for ./ssh directory, 0600 for ./ssh/authorized_keys
```
  3. ```
  Each public_key in ~/.ssh/authorized_keys needs to be a single line without line break.
```
Initialize known_host:
1. You can use /usr/local/scripts/knownhost_init.sh
  1. Usage) /usr/local/scripts/knownhost_init.sh <ServerNumberFrom> <ServerNumberTo>
    If the host is already initialized result of 'hostname' on remote machine is displayed, otherwise you are prompted whether you will add the host to known_host database, where you should type 'yes'.
  2. Example)
    $ knownhost_init.sh 1 20
    atom001
    atom002
    :
2. Note) You may see the following error:
  - 'Permission denied (publickey). '
  - If you follow 2.a, 2.b., it maybe because because the remote user information is not created on atomXXX. We do not use NIS or LDAP so far. You need ask your administrator to setup remote user on atomXXX. See : 'System administrator's guide' below.

return to Table of Contents.

Anchor

	compileRAMCloud
	compileRAMCloud

Compile RAMCloud on the host 'mmatom' )

Git clone from local RAMCloud repositoryGitHub fork, which is pre-tested for the ATOM cluster. Directory structure is almost similar to the same as original Stanford RAMCloud. (We are using DPDK source released in Standard RAMCloud).
You need to have RAMCloud compiled.
Reserve ( Lease ) ATOM servers with /usr/local/bin/mmres
mmres has been ported from rcres. It manages resources for RAMCloud, eg. backup file, and DPDK resource file, etc. The backup replica is preserved until the mmres lease expires, so you can reuse the backup in different program while the lease continues.

Check usage with:

$ mmres --help
Example)

```
$ mmres ls -l
```

- $ git
  clone /var/git/ramcloud-dpdk.git
- $ git clone /home/satoshi/ramcloud-dpdk.git // my local repository debugged.
- ```
$ cd ramcloud-dpdk
```
- ```
$ git submodule update --init
```
- ```
Note) DO NOT add '--recursive' option to git because git creates the following entry in 'logcabin/.git/config' to refer an obsolete repository and results timeout.
```
  > [submodule "gtest"]
  > url = git://fiz.stanford.edu/git/gtest.git
- Note)
  - It is a branch of (Precisely just a modification of clone..)
    commit: f9da67df93726ba5c20a9c703e7abb18931306d3 Date: Thu Aug 14 17:29:15 2014 -0700 'Add RPM SPEC files in new rpm directory'
  - You can check it with '$ git log'

Compile RAMCloud as typing make without any argument.
1. - ```
  $ make
```
- Note)
  1. ```
  Following make options are now default and specified in GNUmakefile.
```
    1. Debug=no
    2. ARCH=atom
  2. CC, CXX, AR is directly specified in order to use '/usr/bin/gcc (4.4.7)' instead of '/usr/local/bin/gcc (4.9.1)' for compiling RAMCloud.

Run clusterperf.py from 'mmatom' on the atomXXX in the cluseter)

- - clone https://github.com/SMatsushi/RAMCloud.git
  - $ cd RAMCloud
  - $ git submodule update --init --recursive
  - $ git checkout new-dpdk
Compile RAMCloud as typing make without any argument.
1. - ```
  $ make -j8 DEBUG=no
```
- ~~$ make -j8 ARCH=atom DEBUG=no DPDK=yes~~<<
  1. ```
   Now you do not need to specify ARCH and DPDK flags since ./private/MakefragPrivateTop are setting flags for ATOM server specific make options.
```
  2. You need to have java and javac installed. We have tested java/javac version 1.7.0_91.
  3. Tools below are located in ./scripts/ManagementTools so far. Please set your path or copy them into your scripts/bin directory.
  - mmres/mmres.py, knowhost_init.sh, ipmiaw2, mmfilter

Anchor

	RunBenchmark
	RunBenchmark

Run benchmark or application samples)

Preparation:
1. You need to have RAMCloud compiled.
Run clusterperf.py : default parameter is equivalent to the , which is replica=3, server=4, backup=1
1. Reserve ( Lease ) ATOM servers with /usr/local/bin/mmres
  1. mmres has been ported from rcres. It manages resources for RAMCloud, eg. backup file, and DPDK resource file, etc. The backup replica is preserved until the mmres lease expires, so you can reuse the backup in different program while the lease continues.
  2. Check usage with: $ mmres --help
    Example)
    - ```
    $ mmres ls -l
```
- ```
$ mmres ls -l atom10-35  or $ mmres ls -l 
```
  atom10
  1. - ```
    10-35  
```
or
1. - ```
   
```
  $
  1. - ```
     
```
mmres
1. - ```
   
```
  ls
  1. - ```
     
```
-l 10-35 // print range
1. - ```
  // print range. You can use '..' instead of '-'.
```
  - ```
  $ mmres lease 14:00 atom10-35 -m 'Comment here!!'
```
Edit the following line in scripts/config.py for your servers reserved. Default is range(1,11) which means atom001 to atom010.
> for i in range(1, 11):
> hosts.append(('atom%03d' % i,
1. 1. - Note) Added 'dx2k' cluster. dx2k stands for DX-2000 which is a new XeonD based micro modular server.
      Try typing mmres ls -l dk2k
  2. Note)
    - Use local time on the server. Please check the local time with 'date' command before trying lease. 'mmres' does not take care of user's timezone so far (python library's limitation....).
2. No need to edit scripts/config.py. The servers reserved are acquired from mmres in scripts/config.py.
3. Now you can spawn benchmarks or applications tasks to ATOM cluster from mmatom (management server) with python scripts.
  - The scripts are customized for the ATOM Cluster so that we can run RAMCloud tests with default option
4. ```
Limitations) to be fixed...
```
```
- A lot of DPDK debug messages starting with "EAL or PMD" is shown in stdout. Now fixing..
Quick Hack: Run 
```
5. ```
with /usr/local/bin/mmfilter 
```
6. ```
wrapper:
 
```
7. ```
 
```
8. ```
Usage: mmfilter command arguments...
   $ mmfilter scripts/clusterperf.py 
```
Run clientSample : It is a simple clientSample code.
1. - $ cd clientSample
  - $ make
  - $ make run
  - or $mmfilter make run
  Note) Limitation original code repeat overwriting 1M times, however, it hangs after 104,000 iteration. It could be a log cleaning problem. Currently the test success after 100,000 iteration.
Logfile: As you will see the message, execution log is created in ./logs/latest which is symbolic link to logs/ "%date%time" .
Benchmarks/samples to be done..
1. Working: clusterperf.py to run all the default tests. Now test failes at "readDistRandom" after "multiWrite_oneMaster"
2. TBD: recovery.py
NOTE)
1. The scripts/config.py is customized for the ATOM Cluster so that we can run RAMCloud tests with default option
2. Limitations)
  1. DPDK debug messages is printed to stdout at first. We are working to suppress the messages to make the result more readable.
  2. mmres does not refers your TZ variable. Please check use localtime where the cluster is located. We would like to modify mmres to refere 'export TZ=xxx' like 'date' command.

Performance result)

Clusterperf.py

Note) We are now trying to move the disturbing DPDK log from standard out to some logfile.

Measured on Wed 25 Feb 2015 12:31:42 PM PST :

basic.read100 13.456 us read single 100B object (30B key) median
basic.read100.min 12.423 us read single 100B object (30B key) minimum
basic.read100.9 13.837 us read single 100B object (30B key) 90%
basic.read100.99 17.698 us read single 100B object (30B key) 99%
basic.read100.999 24.356 us read single 100B object (30B key) 99.9%
basic.readBw100 4.7 MB/s bandwidth reading 100B object (30B key)
basic.read1K 20.425 us read single 1KB object (30B key) median
basic.read1K.min 19.593 us read single 1KB object (30B key) minimum
basic.read1K.9 20.786 us read single 1KB object (30B key) 90%
basic.read1K.99 24.306 us read single 1KB object (30B key) 99%
basic.read1K.999 36.569 us read single 1KB object (30B key) 99.9%
basic.readBw1K 33.4 MB/s bandwidth reading 1KB object (30B key)
basic.read10K 52.461 us read single 10KB object (30B key) median
basic.readBw10K 125.6 MB/s bandwidth reading 10KB object (30B key)
basic.read100K 358.567 us read single 100KB object (30B key) median
basic.readBw100K 187.6 MB/s bandwidth reading 100KB object (30B key)
basic.read1M 3.449 ms read single 1MB object (30B key) median
basic.readBw1M 212.1 MB/s bandwidth reading 1MB object (30B key)basic.write100 43.307 us write single 100B object (30B key) median
basic.write100.min 41.031 us write single 100B object (30B key) minimum
basic.write100.9 47.528 us write single 100B object (30B key) 90%
basic.write100.99 86.543 us write single 100B object (30B key) 99%
basic.write100.999 38.363 ms write single 100B object (30B key) 99.9%
basic.writeBw100 542.6 KB/s bandwidth writing 100B object (30B key)
basic.write1K 63.481 us write single 1KB object (30B key) median
basic.write1K.min 60.453 us write single 1KB object (30B key) minimum
basic.write1K.9 66.720 us write single 1KB object (30B key) 90%
basic.write1K.99 126.391 us write single 1KB object (30B key) 99%
basic.write1K.999 41.500 ms write single 1KB object (30B key) 99.9%
basic.writeBw1K 4.9 MB/s bandwidth writing 1KB object (30B key)
basic.write10K 199.648 us write single 10KB object (30B key) median
basic.writeBw10K 12.0 MB/s bandwidth writing 10KB object (30B key)
basic.write100K 1.508 ms write single 100KB object (30B key) median
basic.writeBw100K 17.8 MB/s bandwidth writing 100KB object (30B key)
basic.write1M 41.949 ms write single 1MB object (30B key) median
basic.writeBw1M 21.8 MB/s bandwidth writing 1MB object (30B key)

# RAMCloud multiRead performance for 100 B objects with 30 byte keys
# located on a single master.
# Generated by 'clusterperf.py multiRead_oneMaster'
#
# Num Objs Num Masters Objs/Master Latency (us) Latency/Obj (us)
#----------------------------------------------------------------------------

1 1 1 23.0 22.99
2 1 2 28.3 14.15
3 1 3 33.2 11.07
9 1 9 49.5 5.50

50 1 50 168.7 3.37
60 1 60 235.5 3.93
70 1 70 209.1 2.99

Recovery.py

System administrator's guide)

Adding User)

 So far we do not use NIS/LDAP for account management. Please use a script to setup new users.

NOTE) To run the following command, you must be a user on both management server and all the atom servers with administrative privilege . Please ask us to give you the privilege for the cluster.

User setup procedure)

Still debugging the tool, please wait to use it.....

/usr/local/bin/mmuser <User1> [<User2> .... ]
The account on mmatom is copied to all the atomXXX.

Hardware maintenance)

1. ```
basic 
```
Note: A shell script 'Run.sh' at the top level directory takes care of RAMCloud compile and run benchmarks. You only need to 'Run.sh' after 'mmres'. Please take a look into 'Run.sh' for details.

Run clusterperf.py) It's default parameter is equivalent to the , which is replica=3, server=4, backup=1

$ mmfilter scripts/clusterperf.py　--transport=basic+dpdk　[Tests]

[Tests]: Names of tests. Please take a look into clusterperf.py for available tests.

Limitation)

- Problem running test 'indexBasic' or 'indexMultiple'. They are to slow and end with timeout. All other test runs OK.

Run recovery.py)
- $ mmfilter scripts/recovery.py -v --timeout=1000 --transport=basic+dpdk
Anchor
CompileRunApplications
CompileRunApplications
Run clientSample) The simple client code.
- $ cd clientSample
- $ make
- $ make run
  - or $mmfilter make run
1. Note) Now it successfully completes after original 1M iterations.

return to Table of Contents.

Analysis or Debugging)

Subcommands:
1. scripts/clusterperf.py uses run function defined in scripts/cluster.py.
2. cluster.py referes common.py for sandbox. Through common.py, default setting in config.py is referred.
Useful options in the most python scripts.
1. -v for verbose, - -dry (Note that two dashes for dry) for dry run (just printing command and create log directory)
2. There are four standard transport setting for ATOM server, which can be specified with -T or --transport option. Please take a look at scripts/config.py for the option.
Result and logfile:
1. Print out in the server or client is forwarded to standard out (screen)
2. Log messages, printed by RAMCLOUD_LOG(loglevel, format, args..) are stored in logs directory.
  1. There are four log levels (ERROR, WARNING, NOTICE, DEBUG), DEBUG log is only stored when the log level is debug. Higher level log message is printed even with lower log mode.
  2. Log level is specified with -l or --logLevel option of any python script.
3. Log files are stored in directory with numbers meaning "%date%time" under ./logs directory where the command is executed. Normally the top level, where obj.* and scripts directory exist.
  1. Useful command is located in /usr/local/bin which are:
    1. logCleanup.sh : Cleanup log directories. Without any arguments, it looks for log directory under current directory and ask if delete it. You will see its help with -h option.
    2. logSummary.pl : Print summary of logs in the log directories. You can run it in the log directory or specify the directory as argument. Check its options with -h option. Cluster.py or other scripts must be executed with NOTICE or DEBUG option for the script to analyze coordinator log for server id.

return to Table of Contents.

Performance result)
Anchor
performanceResult
performanceResult

Clusterperf.py

Note)

Basic.read100 takes 13 us on CentOS 7.1 and DPDK 2.0 with the latest release on GitHub. We are now debugging it.
Please take a look at /usr/include/rte_version.h for DPDK version. See also: http://dpdk.org/doc/api/rte__version_8h.html
DPDK log message starting from 'EAL' or 'PMD' are seen in stdout . Please use mmfilter wrapper to invoke a test like: 'mmfilter clusterperf.py -v --transport=...' to remove them. We are going to resolve it by patching DPDK source code.

Measured on Jan. 15, 2016 :

basic.read100 13.6 us read random 100B object (30B key) median
basic.read100.min 12.9 us read random 100B object (30B key) minimum
basic.read100.9 13.9 us read random 100B object (30B key) 90%
basic.read100.99 18.7 us read random 100B object (30B key) 99%
basic.read100.999 35.5 us read random 100B object (30B key) 99.9%
basic.readBw100 6.7 MB/s bandwidth reading 100B objects (30B key)
basic.read1K 20.9 us read random 1KB object (30B key) median
basic.read1K.min 20.0 us read random 1KB object (30B key) minimum
basic.read1K.9 21.5 us read random 1KB object (30B key) 90%
basic.read1K.99 28.3 us read random 1KB object (30B key) 99%
basic.read1K.999 47.9 us read random 1KB object (30B key) 99.9%
basic.readBw1K 44.9 MB/s bandwidth reading 1KB objects (30B key)
basic.read10K 53.1 us read random 10KB object (30B key) median
basic.read10K.min 51.9 us read random 10KB object (30B key) minimum
basic.read10K.9 55.0 us read random 10KB object (30B key) 90%
basic.read10K.99 64.1 us read random 10KB object (30B key) 99%
basic.read10K.999 83.6 us read random 10KB object (30B key) 99.9%
basic.readBw10K 177.2 MB/s bandwidth reading 10KB objects (30B key)
basic.read100K 360.7 us read random 100KB object (30B key) median
basic.read100K.min 358.5 us read random 100KB object (30B key) minimum
basic.read100K.9 365.2 us read random 100KB object (30B key) 90%
basic.read100K.99 424.1 us read random 100KB object (30B key) 99%
basic.read100K.999 452.2 us read random 100KB object (30B key) 99.9%
basic.readBw100K 262.0 MB/s bandwidth reading 100KB objects (30B key)
basic.read1M 3.5 ms read random 1MB object (30B key) median
basic.read1M.min 3.4 ms read random 1MB object (30B key) minimum
basic.read1M.9 3.5 ms read random 1MB object (30B key) 90%
basic.read1M.99 3.5 ms read random 1MB object (30B key) 99%
basic.read1M.999 3.6 ms read random 1MB object (30B key) 99.9%
basic.readBw1M 273.3 MB/s bandwidth reading 1MB objects (30B key)
basic.write100 43.4 us write random 100B object (30B key) median
basic.write100.min 41.5 us write random 100B object (30B key) minimum
basic.write100.9 47.5 us write random 100B object (30B key) 90%
basic.write100.99 117.6 us write random 100B object (30B key) 99%
basic.write100.999 203.0 us write random 100B object (30B key) 99.9%
basic.writeBw100 1.9 MB/s bandwidth writing 100B objects (30B key)
basic.write1K 66.8 us write random 1KB object (30B key) median
basic.write1K.min 64.4 us write random 1KB object (30B key) minimum
basic.write1K.9 71.5 us write random 1KB object (30B key) 90%
basic.write1K.99 181.7 us write random 1KB object (30B key) 99%
basic.write1K.999 309.7 us write random 1KB object (30B key) 99.9%
basic.writeBw1K 12.2 MB/s bandwidth writing 1KB objects (30B key)
basic.write10K 210.3 us write random 10KB object (30B key) median
basic.write10K.min 205.5 us write random 10KB object (30B key) minimum
basic.write10K.9 217.5 us write random 10KB object (30B key) 90%
basic.write10K.99 514.1 us write random 10KB object (30B key) 99%
basic.write10K.999 837.4 us write random 10KB object (30B key) 99.9%
basic.writeBw10K 42.2 MB/s bandwidth writing 10KB objects (30B key)
basic.write100K 1.5 ms write random 100KB object (30B key) median
basic.write100K.min 1.5 ms write random 100KB object (30B key) minimum
basic.write100K.9 1.6 ms write random 100KB object (30B key) 90%
basic.write100K.99 2.3 ms write random 100KB object (30B key) 99%
basic.write100K.999 11.6 ms write random 100KB object (30B key) 99.9%
basic.writeBw100K 59.2 MB/s bandwidth writing 100KB objects (30B key)
basic.write1M 15.3 ms write random 1MB object (30B key) median
basic.write1M.min 15.2 ms write random 1MB object (30B key) minimum
basic.write1M.9 15.9 ms write random 1MB object (30B key) 90%
basic.write1M.99 27.9 ms write random 1MB object (30B key) 99%
basic.writeBw1M 60.2 MB/s bandwidth writing 1MB objects (30B key)

Measured on Wed 25 Feb 2015 12:31:42 PM PST :

basic.read100 13.456 us read single 100B object (30B key) median
basic.read100.min 12.423 us read single 100B object (30B key) minimum
basic.read100.9 13.837 us read single 100B object (30B key) 90%
basic.read100.99 17.698 us read single 100B object (30B key) 99%
basic.read100.999 24.356 us read single 100B object (30B key) 99.9%
basic.readBw100 4.7 MB/s bandwidth reading 100B object (30B key)

basic.read1K 20.425 us read single 1KB object (30B key) median
basic.read1K.min 19.593 us read single 1KB object (30B key) minimum
basic.read1K.9 20.786 us read single 1KB object (30B key) 90%
basic.read1K.99 24.306 us read single 1KB object (30B key) 99%
basic.read1K.999 36.569 us read single 1KB object (30B key) 99.9%
basic.readBw1K 33.4 MB/s bandwidth reading 1KB object (30B key)
basic.read10K 52.461 us read single 10KB object (30B key) median
basic.readBw10K 125.6 MB/s bandwidth reading 10KB object (30B key)
basic.read100K 358.567 us read single 100KB object (30B key) median
basic.readBw100K 187.6 MB/s bandwidth reading 100KB object (30B key)
basic.read1M 3.449 ms read single 1MB object (30B key) median
basic.readBw1M 212.1 MB/s bandwidth reading 1MB object (30B key)
basic.write100 43.307 us write single 100B object (30B key) median
basic.write100.min 41.031 us write single 100B object (30B key) minimum
basic.write100.9 47.528 us write single 100B object (30B key) 90%
basic.write100.99 86.543 us write single 100B object (30B key) 99%
basic.write100.999 38.363 ms write single 100B object (30B key) 99.9%
basic.writeBw100 542.6 KB/s bandwidth writing 100B object (30B key)

basic.write1K 63.481 us write single 1KB object (30B key) median
basic.write1K.min 60.453 us write single 1KB object (30B key) minimum
basic.write1K.9 66.720 us write single 1KB object (30B key) 90%
basic.write1K.99 126.391 us write single 1KB object (30B key) 99%
basic.write1K.999 41.500 ms write single 1KB object (30B key) 99.9%
basic.writeBw1K 4.9 MB/s bandwidth writing 1KB object (30B key)
basic.write10K 199.648 us write single 10KB object (30B key) median
basic.writeBw10K 12.0 MB/s bandwidth writing 10KB object (30B key)
basic.write100K 1.508 ms write single 100KB object (30B key) median
basic.writeBw100K 17.8 MB/s bandwidth writing 100KB object (30B key)
basic.write1M 41.949 ms write single 1MB object (30B key) median
basic.writeBw1M 21.8 MB/s bandwidth writing 1MB object (30B key)

# RAMCloud multiRead performance for 100 B objects with 30 byte keys
# located on a single master.
# Generated by 'clusterperf.py multiRead_oneMaster'
#
# Num Objs Num Masters Objs/Master Latency (us) Latency/Obj (us)
#----------------------------------------------------------------------------
1 1 1 23.0 22.99
2 1 2 28.3 14.15
3 1 3 33.2 11.07

9 1 9 49.5 5.50
50 1 50 168.7 3.37
60 1 60 235.5 3.93
70 1 70 209.1 2.99

Recovery.py

return to Table of Contents.

System administrator's guide)
Anchor
systemAdmGuide
systemAdmGuide

Adding User)

 So far we do not use NIS/LDAP for account management. Please use a script to setup new users.

NOTE)

User setup command onto ATOM cluster )

Limitation: mmuser command uses some helper routine located in the directory and locate data to be transferred to the servers using NFS share.
- We need to copy /usr/local/mmutils/mmres to your home directory which is NFS shared to ATOM cluster and run mmuser.sh in the directory.
- You need to have 'sudo su' permission on cluster server. Or you will become root with 'sudo su' on management server.
NOTE)
- To run the following command, you must already be a privileged user among the management server and all the atom servers.
  Please ask us to make or delete privileged users for the cluster.
- <User*> below are account-name on mmatom. <User*> must be non-privileged user.
Add user(s) )
- $ cd /usr/local/mmutils/mmuser
- $ ./mmuser.sh <User1> [ <User2> .... ] // create users both on a management host and cluster nodes.
- $./mmuser.sh -c <User1> [ <User2> .... ] // create uses on cluster nodes.
- $ ./mmuser.sh -m <User1> [ <User2> .... ] // create uses only on a management host.
Delete user(s) )
- $ ./mmuser.sh -d <User1> [ <User2> ....]
Deliver password related files to cluster nodes) ... Since Ansibile scripts does not work, Use another script.
- login or su to user 'admin'.
- cd ~admin
- $ remoteYum.sh <prefix> <from> <to> passwd
  <prefix> : ether 'atom' or 'dx'
  <from> : starting host number
  <to> : ending host number
  eg: remoteYum.sh dx 1 10 passwd // deliver passwd,shadow,group,sudoers to dx001 .. dx010

Hardware maintenance)

GitHub repository for NEC Tools for the ATOM Server
1. The product name of the server is NEC Micro Server DX1000. Please refer repository for the product.
2. Procedures:
  - $ git clone https://github.com/SMatsushi/NECTools.git
  - You will find scripts in NECTools/DX1000/scripts
    - ipmiaw2 : a wrapper for impitool
      - Save ipmi Username in ~/.atom/ipmi_user.txt , ipmi Password in ~/.atom/impi_password.tx
      - ipmiaw2 adds ipmitool option '-l lanplus -U <usenamer> -P <password>', so you do not need to type them.
      - You can specify server, management port name with range format. Just enter 'ipmiaw2' see the usage.
Getting module's slot/chassis information and Mac address for a hostname or IP address.
1. You will find the information in mmatom: /etc/hosts :
  
  IPaddress hostname Mac Address Installed slot and host port connected.
2. For atomXXX(Y), XXX is always corresponds to final digit of IP address:
  eg. atom100 == 192.168.3.100, atom110a == 192.168.4.110

Security solution)

SSH into management server
Job spawning to cluster servers from management server
Cluster management
1. Server console
2. IPMI to CMM, BMC
3. USB connection to ONS from front panelto ONS from front panel

return to Table of Contents.

For management server setup)
Anchor
setupGuide
setupGuide

Setup guide
- 20150910-mgmt-server-Setup-r1.pdf
Config files referred in the setup guide
- config.tar.gz

Stanford's ATOM Cluster - configuration, etc)
Anchor
stanfordAtomCluster
stanfordAtomCluster

1. system photograph)

2. ATOM server module specification)
Anchor
atomModuleSpecification
atomModuleSpecification

3 chassis are installed (The rack in above picture contains 16 chassis.)
Installed ATOM modules are with C2730 (1.7GHz, 8 core/8 threads, 12 W)

3. Network connection)

Image Removed

　updated on Nov. 17, 2015
Anchor
atomNetworkConnection
atomNetworkConnection

Image Added

Due to historical reason and considering future experiment, VLAN configuration is different in chassis. We would like change every node to default VLAN configuration as Chassis 1.
Management server is directly connect to the internet, the cluster is isolated from other Stanford servers.
1. Management server works as firewall, login server, firewall, NIS server, DHCP server, and PXE server for reconfiguring ATOM servers.

return to Table of Contents.

Reference:

ATOM Cluster - Old Page

return to Table of Contents.

Versions Compared

Old Version 62

New Version Current

Key

We have published initial RAMCloud sources to GitHub. We are planning to merge them to PlatformLab/RAMCloud/master. We are still making some trials, please forgive us for some glitches.

Contents)

Cluster Outline)
Anchor
clusterOutline
clusterOutline

User's Guide)
Anchor
userGuide
userGuide

Performance result)

Clusterperf.py

Recovery.py

System administrator's guide)

Adding User)

Performance result)
Anchor
performanceResult
performanceResult

Clusterperf.py

Recovery.py

System administrator's guide)
Anchor
systemAdmGuide
systemAdmGuide

Adding User)

Hardware maintenance)

Security solution)

return to Table of Contents.

For management server setup)
Anchor
setupGuide
setupGuide

Stanford's ATOM Cluster - configuration, etc)
Anchor
stanfordAtomCluster
stanfordAtomCluster

1. system photograph)

2. ATOM server module specification)
Anchor
atomModuleSpecification
atomModuleSpecification

3. Network connection)

updated on Nov. 17, 2015
Anchor
atomNetworkConnection
atomNetworkConnection

Page Comparison

Versions Compared

Old Version 62

New Version Current

Key

<span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-0">[data-colorid=</span>

Contents)

Cluster Outline) AnchorclusterOutlineclusterOutline

User's Guide) AnchoruserGuideuserGuide

Performance result)

Clusterperf.py

Recovery.py

System administrator's guide)

Adding User)

Performance result) AnchorperformanceResultperformanceResult

Clusterperf.py

Recovery.py

System administrator's guide) AnchorsystemAdmGuidesystemAdmGuide

Adding User)

Hardware maintenance)

Security solution)

return to Table of Contents.

For management server setup) AnchorsetupGuidesetupGuide

Stanford's ATOM Cluster - configuration, etc) AnchorstanfordAtomClusterstanfordAtomCluster1. system photograph)

2. ATOM server module specification) AnchoratomModuleSpecificationatomModuleSpecification

3. Network connection)

updated on Nov. 17, 2015 AnchoratomNetworkConnectionatomNetworkConnection

Cluster Outline)
Anchor
clusterOutline
clusterOutline

User's Guide)
Anchor
userGuide
userGuide

Performance result)
Anchor
performanceResult
performanceResult

System administrator's guide)
Anchor
systemAdmGuide
systemAdmGuide

For management server setup)
Anchor
setupGuide
setupGuide

Stanford's ATOM Cluster - configuration, etc)
Anchor
stanfordAtomCluster
stanfordAtomCluster

1. system photograph)

2. ATOM server module specification)
Anchor
atomModuleSpecification
atomModuleSpecification

　updated on Nov. 17, 2015
Anchor
atomNetworkConnection
atomNetworkConnection