Cluster Custodian
If you notice a down machine that doesn't seem to respond to rcreboot ping the IRC user listed for the current week listed below.
- 7/22/12: stutsman
- 7/29/12: ankitak
- 8/5/12: ongardie
- 8/12/12: daeschli
- 8/19/12: mendel
- 8/26/12: ouster
- 9/2/12: syang0
- 9/9/12: satoshi
The current custodian is responsible for restarting, debugging, and reimaging machines and generally keeping the cluster working.
Crashes
This page logs instances of dead machines; we are using it to track down the mysterious machine crashes that occurred starting in August 2011.
- June 15: rc37 (failed to reboot), rc38 (failed to reboot), rc79
...