...
- August 7: rc20 (no response to ipmi, see below)
- rc20: Ports 8 and 9 on the 1G switch seem to be bad, and the jack on rc20 itself is loose. There were no more ports on that switch, so I plugged in the extra 50-port switch and routed rc20 and rc30's IPMI through this 50-port switch. Then I was able to IPMI to rc20, which was up but its network(s) seemed down. Nothing too interesting. Zapped it, since it's been offline for a while. Unfortunately, the install script hangs because it sets the ethernet MTU to 9000. I guess that 50-port switch can't handle jumbo frames. We should be using a normal MTU, even if we get our switch problems worked out. I dropped the MTU back to 1500 and changed the NFS settings to use default-sized reads and writes. NFS still didn't work over UDP, so I changed it to use TCP. Things should work for now, but rc20 and rc30 are gimped – they have a 100Mbit connection to rcnfs now. We'll need to troubleshoot/RMA the HP switch on the first rack (RAM-445). //// Update August 9: I rebooted the HP switch, and port 9 works again. rc20 and rc30 are back on it. Works for now. Return the switch if this keeps happening (RAM-445). -Diego
- August 2: rc79
- July 31: rc79
- July 30: rc20 (no response to ipmi, see below), rc79
- rc20: Port 9 on 1G switch seems to be bad.
- July 23, rc20 (no response to ipmi), rc61, rc63, rc64, rc68 (these are likely due to operator error, but needed manual maintenance to fix)
- July 19: rc03 (reimage), rc04 (reimage), rc20 (no response to ipmi, see below), rc26 (reimage), rc37, rc38 (no response to ipmi, see below), rc39, rc78 (reimage), rc80
- rc20: 1G cable loose; doesn't seat well in NIC. If problem continues may need to bend clip or try other cables.
- rc38: Port 8 on 1G switch seems to be bad. Hopefully diagnostics on switch can tell us more. Connected to port 48T which is the 1/10G uplink port, no other ports free. Diagnostics didn't provide much info. In the 4 days of uptime port 8 never successfully detected a connected cable, even after trying neighboring, known-working end-points' cables with it. It seems to be out-of-commission.
- June 15: rc37 (failed to reboot), rc38 (failed to reboot), rc79
...