Reimaging a Cluster Machine

This page contains instructions for how to reimage a cluster machine: it will wipe the machine clean and reinstall all software.

  • Invoke the following commands on rcmaster, replacing rcXX with the actual server name, such as rc23:
    rczap
    ipmitool -I lanplus -H rcXXipmi -U ADMIN chassis bootdev pxe
    ipmitool -I lanplus -H rcXXipmi -U ADMIN power reset


    The first command records information that will force a "PXE" boot the next time the machine is booted, and the second command actually reboots the machine.

  • Note: in order to use these commands, you will need the IPMI password. The easiest way to handle this is to get the password from someone who knows it (such as John Ousterhout), store the password in a file ipmiPassword in your home directory, and then add the additional switch -f ~/ipmiPassword to the commands above.
  • The reinstallation process for the machine will be controlled by the following file on rcmaster:

    /var/www/html/post_install.sh

    Among other things, this file will install all of the various software packages that we need for RAMCloud.

  • If the machine successfully reimages itself, it will come back to life with a file in /root/POST_INSTALL; the presence of this file indicates that the installation script completed successfully. If this file is not present, it means there was a problem that you have to track down; we currently have no particular advice for you on how to track down such problems. However, a good starting place would to be to look at /root/POST_INSTALL_DEBUG and /root/POST_INSTALL_ERROR_LOG which contain the stdout and stderr outputs to post_install.sh respectively.
  • Our cluster machines are not intended to store any information permanently: it should always be safe to reimage a cluster machine any time that no one is using it.