The script scripts/recovery.py
can be used to run recoveries for testing. Diego wrote the script and is most expert on it, but here are some simple instructions:
- You should probably set up ssh master mode for each of the cluster nodes. Here is a shell script that you can run on rcmaster to do it:
#!/bin/sh # # This script sets up ssh master mode for all of the machines # in the RAMCloud cluster. if [ $(hostname) == "rcmaster.scs.stanford.edu" ]; then for host in rc{01..36}; do if [ -z "$(pgrep -u $USER -fx "ssh -fMN $host true")" ]; then ssh -fMN $host true 2>/dev/null & fi done fi
- From a RAMCloud directory in which you have compiled the system, invoke
scripts/recovery.py
. To be safe, run this on rcmaster: it is unclear whether it will work on other machines.
- This will run a simple recovery with one partition and one backup. The simplest way to run more complex recoveries is to modify
recovery.py
to change the arguments to therecover
method. For example, if you replace the last line of the script (which is currentlypprint.pprint(recover()
with the following:it will run with a total of 2 partitions and 12 backups.args = {} args['numBackups'] = 12 args['numPartitions'] = 2 args['objectSize'] = 1024 args['disk'] = 1 args['numObjects'] = 626012 * 600 // 640 args['oldMasterArgs'] = '-m %d' % (800 * args['numPartitions']) args['newMasterArgs'] = '-m 16000' args['replicas'] = 3 pprint.pprint(recover(**args))
- The log files for all of the servers involved in the recovery are placed in the directory
recovery/latest
. If you run more recoveries,recovery/latest
always refers to the most recent recovery, but log files from old recoveries are kept in other subdirectories ofrecovery
.
- After running a recovery, you can run
scripts/metrics.py
, which will examine the logs inrecovery/latest
and produce summary information describing the recovery.