...
- November 11: rc07, rc10, rc16
- November 10: rc10, rc36
- November 9: rc03, rc04, rc07, rc10, rc11, rc33
- November 8: rc03, rc20 (Ryan: I've noticed over a few days rc20 must be rebooted twice to get it to come back up.)
- November 7: rc02, rc03, rc08, rc16, rc18, rc20
- November 4: rc10, rc16, rc18
- November 3: rc18
- November 2: rc11, rc12, rc20
- October 20: rc04, rc20, rc35
- October 19: rc04, rc20
- October 18: rc01, rc05, rc11, rc16, rc24
- October 17: rc13, rc16, rc20
- October 13: rc16
- October 12: rc21, rc38
- October 11: rc03, rc08, rc10, rc16, rc19
- October 10: rc02, rc05, rc06, rc08, rc10, rc12, rc16, rc33, rc36
- rc02 and rc17 were up, but claimed a read only / file system
- October 7: rc06, rc10, rc11, rc13, rc16, rc19
- October 6: rc04
- October 5: rc01
- October 4: rc08
- October 3: rc05, rc10, rc12, rc14, rc18, rc33
- September 30: rc07, rc10, rc11, rc13, rc19, rc20, rc36
- JO restarted all of them, and all came up except rc19 & rc20.
- rc19 & 20 back up (PSU decided to work again today?!)
- JO restarted all of them, and all came up except rc19 & rc20.
- September 28: rc04, rc19, rc20, rc21, rc25, rc26
- rc04 was powered off
- rc19/20 appears to have a bad power supply
- rc21 was at Linux login prompt with cursor blinking, but didn't respond to keyboard
- rc25/26 was not plugged in (oops)
Notes
Nov 11th:
Histrogram of failures as of Nov 11th:
Code Block |
---|
rc01: **
rc02: **
rc03: ****
rc04: *****
rc05: ***
rc06: **
rc07: ***
rc08: ****
rc09:
rc10: *********
rc11: *****
rc12: ***
rc13: ***
rc14: *
rc15:
rc16: *********
rc17:
rc18: ****
rc19: ****
rc20: ********
rc21: **
rc22:
rc23:
rc24: *
rc25: *
rc26: *
rc27:
rc28:
rc29:
rc30:
rc31:
rc32:
rc33: ***
rc34:
rc35: *
rc36: ***
rc37:
rc38: *
rc39:
rc40: |
"System Temp" as reported by ipmi (rc01 first, rc40 last):
Code Block |
---|
System Temp | 29 degrees C | ok
System Temp | 28 degrees C | ok
System Temp | 27 degrees C | ok
System Temp | 28 degrees C | ok
System Temp | 27 degrees C | ok
System Temp | 28 degrees C | ok
System Temp | 28 degrees C | ok
System Temp | 27 degrees C | ok
System Temp | 28 degrees C | ok
System Temp | 28 degrees C | ok
System Temp | 27 degrees C | ok
System Temp | 28 degrees C | ok
System Temp | 27 degrees C | ok
System Temp | 27 degrees C | ok
System Temp | 26 degrees C | ok
System Temp | 27 degrees C | ok
System Temp | 26 degrees C | ok
System Temp | 26 degrees C | ok
System Temp | 25 degrees C | ok
System Temp | 25 degrees C | ok
System Temp | 25 degrees C | ok
System Temp | 26 degrees C | ok
System Temp | 22 degrees C | ok
System Temp | 25 degrees C | ok
System Temp | 22 degrees C | ok
System Temp | 24 degrees C | ok
System Temp | 23 degrees C | ok
System Temp | 24 degrees C | ok
System Temp | 23 degrees C | ok
System Temp | 23 degrees C | ok
System Temp | 22 degrees C | ok
System Temp | 23 degrees C | ok
System Temp | 21 degrees C | ok
System Temp | 23 degrees C | ok
System Temp | 20 degrees C | ok
System Temp | 23 degrees C | ok
System Temp | 20 degrees C | ok
System Temp | 21 degrees C | ok
System Temp | 20 degrees C | ok |
Nov 7th: Trying to see if OFED stack is causing the problem by removing it on rc01-rc20. Why the lower 20 are overrepresented compared to the upper 20 I don't know.