Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • September 4  –  rebooted but fsck faied, so cleaned up the file system (zapped.) - Satoshi, Ryan
    • crash: rc20
    • Requrire password (possible NFS problem): rc45,47, 49-52
    • Responds to pings but does notallow ssh: rc46
    • Does not respond to pings: rc48
  • August 7: rc20 (no response to ipmi, see below)
    • rc20: Ports 8 and 9 on the 1G switch seem to be bad, and the jack on rc20 itself is loose. There were no more ports on that switch, so I plugged in the extra 50-port switch and routed rc20 and rc30's IPMI through this 50-port switch. Then I was able to IPMI to rc20, which was up but its network(s) seemed down. Nothing too interesting. Zapped it, since it's been offline for a while. Unfortunately, the install script hangs because it sets the ethernet MTU to 9000. I guess that 50-port switch can't handle jumbo frames. We should be using a normal MTU, even if we get our switch problems worked out. I dropped the MTU back to 1500 and changed the NFS settings to use default-sized reads and writes. NFS still didn't work over UDP, so I changed it to use TCP. Things should work for now, but rc20 and rc30 are gimped – they have a 100Mbit connection to rcnfs now. We'll need to troubleshoot/RMA the HP switch on the first rack (RAM-445). //// Update August 9: I rebooted the HP switch, and port 9 works again. rc20 and rc30 are back on it. Works for now. Return the switch if this keeps happening (RAM-445).  -Diego
  • August 2: rc79
  • July 31: rc79
  • July 30: rc20 (no response to ipmi, see below), rc79
    • rc20: Port 9 on 1G switch seems to be bad.
  • July 23, rc20 (no response to ipmi), rc61, rc63, rc64, rc68 (these are likely due to operator error, but needed manual maintenance to fix)
  • July 19: rc03 (reimage), rc04 (reimage), rc20 (no response to ipmi, see below), rc26 (reimage), rc37, rc38 (no response to ipmi, see below), rc39, rc78 (reimage), rc80
    • rc20: 1G cable loose; doesn't seat well in NIC. If problem continues may need to bend clip or try other cables.
    • rc38: Port 8 on 1G switch seems to be bad. Hopefully diagnostics on switch can tell us more. Connected to port 48T which is the 1/10G uplink port, no other ports free. Diagnostics didn't provide much info. In the 4 days of uptime port 8 never successfully detected a connected cable, even after trying neighboring, known-working end-points' cables with it. It seems to be out-of-commission.
  • June 15: rc37 (failed to reboot), rc38 (failed to reboot), rc79

...