Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The script scripts/recovery.py can be used to run recoveries for testing. Diego wrote the script and is most expert on it, but here are some simple instructions:

  • You should probably set up ssh master mode for each of the cluster nodes. Here is a shell script that you can run on rcmaster to do it:

    No Format
    #!/bin/sh
    #
    # This script sets up ssh master mode for all of the machines
    # in the RAMCloud cluster.
    
    if [ $(hostname) == "rcmaster.scs.stanford.edu" ]; then
        for host in rc{01..80}; do
            if [ -z "$(pgrep -u $USER -fx "ssh -fMN $host true")" ]; then
                ssh -fMN $host true 2>/dev/null &
            fi
        done
    fi
    
  • From a RAMCloud directory in which you have compiled the system, invoke scripts/recovery.py. To be safe, run this on rcmaster: it is unclear whether it will work on other machines. 
  • This will run a recovery with a default configuration (currently as many masters and backups as the cluster can support). To try recoveries with different configurations, change the arguments passed to the recover method, which are specified at the very end of scripts/recovery.py.

...

  • After running a recovery, you can run scripts/metricsrecoverymetrics.py, which will examine the logs in logs/latest and produce summary information describing the recovery.