...
This page logs instances of dead machines; we are using it to track down the mysterious machine crashes that occurred starting in August 2011.
- July 30: rc20, rc79
- July 23, rc20 (no response to ipmi), rc61, rc63, rc64, rc68 (these are likely due to operator error, but needed manual maintenance to fix)
- July 19: rc03 (reimage), rc04 (reimage), rc20 (no response to ipmi, see below), rc26 (reimage), rc37, rc38 (no response to ipmi, see below), rc39, rc78 (reimage), rc80
- rc20: 1G cable loose; doesn't seat well in NIC. If problem continues may need to bend clip or try other cables.
- rc38: Port 8 on 1G switch seems to be bad. Hopefully diagnostics on switch can tell us more. Connected to port 48T which is the 1/10G uplink port, no other ports free. Diagnostics didn't provide much info. In the 4 days of uptime port 8 never successfully detected a connected cable, even after trying neighboring, known-working end-points' cables with it. It seems to be out-of-commission.
- June 15: rc37 (failed to reboot), rc38 (failed to reboot), rc79
...