Controlling Machines Remotely via IPMI

IPMI

Each server has a little ARM machine glued on the side with a dedicated ethernet port. Using standard protocols, as well as manufacturer extensions, we can do all sorts of useful things to the server remotely. Each machine's IPMI card has an IP address that resolves from host "rcXXipmi". Some of these include:

  • Powering on/off, resetting, warm reboots, software shutdowns, etc.
  • Accessing a serial-over-lan version of the console (can use this to configure BIOS parameters, as well as get a Linux console)
  • Setting PXE boot for the next boot cycle (allows easy network re-installs)
  • Listing sensor status (temperatures, voltages, etc)
  • Listing error log (ECC errors, even SMART disk errors)
  • Getting a remote KVM console (keyboard, video, mouse)

There are two main utilities you'll want to use:

ipmitool

ipmitool is linux app that speaks the ipmi protocol to local and remote servers. Here are some example commands to get you started (read the extensive man page for more info):

  • Get a serial-over-lan console on rcXX: ipmitool -I lanplus -H rcXXipmi -U ADMIN -a sol activate
  • Get the power status: ipmitool -I lanplus -H rcXXipmi -U ADMIN chassis status
  • Reboot a machine: ipmitool -I lanplus -H rcXXipmi -U ADMIN power reset
  • Force PXE boot on the next boot only: ipmitool -I lanplus -H rcXXipmi -U ADMIN chassis bootdev pxe
    (This will cause the machine to reinstall all its software on the next boot)
  • Reboot the IPMI card: ipmitool -I lanplus -H rcXXipmi -U ADMIN mc reset cold
  • Get sensor output: ipmitool -I lanplus -H rcXXipmi -U ADMIN sdr list
  • Get the error log: ipmitool -I lanplus -H rcXXipmi -U ADMIN sel elist

NB: Our SuperMicro machines appear to log SMART failures as OEM #0xff, e.g. ipmitool outputs something like:

   1 | 08/13/2011 | 07:08:13 | OEM #0xff | OEM Specific | Asserted

ipmitool will ask for a password for user ADMIN in all cases. You can avoid this by putting it in a 0600 file and using the -f flag, or passing it on the command line (-P; not super-secure, but our environment is pretty trusted), or putting it in your environment (see the man page).

ipmitw

ipmitool is a bit of a pain to use (lots of arguments, can't specify everything in environment variables or ~/.ipmitool config files, etc, only operates on one machine at a time). That's why I threw together ipmitw (ipmi tool wrap) to simplify things. You can download it here.

All it does is run ipmitool, providing the annoying user/password/protocol args automatically, and executing it once for each host listed. Hosts can be listed in ranges, or comma separated (one contiguous string – no spaces). For example:

ipmitw rc01-rc80 power reset                       (reset power on all 80 machines)
ipmitw 1-80 power reset                            (same as above, but without rc prefix)
ipmitw rc05,rc10,rc15-20,rc57,82 chassis status    (can list hosts, as well as ranges)

 

IPMI View

SuperMicro distributes a multi-platform Java GUI app called "IPMI View" that incorporates the above functions, and more. It's a bit slow and not amenable to scripting, but it does support the KVM console, which can be very useful. Note that your machine will have to be on the internal cluster VLAN to access the ipmi controllers.

IPMI View can be obtained at

ftp://ftp.supermicro.com/utility/IPMIView/

What's the ipmi password?

Ask Steve Rumble. Or, ask John (it's in ~ouster/ipmiPassword).