Child pages
  • Cluster Issues
Skip to end of metadata
Go to start of metadata

Central documentation of issues we see in the cluster – those that we want to solve right away, as well as those that we don't. This is hopefully help us see patterns if any, apart from making it easy for the cluster custodian to keep track of issues that currently need attention.


Observed onComponent affectedDescriptionReporterResolutionResolver
2014/10/??Infiniband switch that connects to rc01The status light has a color between yellow and red. Might mean firmware issue based on the manualAnkita

May 19, 2015rcmaster"BMC" failedHenry

May 19, 2015rc cluster machines11 power supplies have failed.Jonathan1 power supply was resurrected, 10 new units were purchased and installed.Collin
May 21, 2015Port 20 on right rack ethernetNetwork port assumed to have failed; rc58 which was connected to it was not able to connect to network through it.Collin

May 21, 2015rc20Unable connect to network. No Ping; No IPMI;CollinSome combination of wire jiggling and rebooting seems to have solved the issue.Collin
May 21, 2015rc50Network connection issuesCollinrczapCollin
Jun 24, 2015Port 15 on right rack ethernetNetwork port assumed to have failed; rc58 which was connected to it was not able to connect to network through it.Ankita

Dec 11, 2015rc52showing only 12 GB of memoryJohnOpen the box and reconditioned all sticks.Seo Jin
Jan 6, 2016rc46, rc70No IB devices found (nothing reported from ibv_devinfo.)Seo JinReplugging wire & manual power off & on. (IPMI reboot didn't resolve the issue.)Seo Jin
Feb 20, 2016Ports 8,9,13,15,16,20 on right rack ethernetThese ports don't appear to be functioning.JonathanAvoid using them.Jonathan

Dec 23,

2016

rc47Networking issue, bad network switch port/cable. Wasn't able to connect to the host.BehnamResolved!Behnam

Jan 10,

2016

rc57Host unreachable ssh error. bad network switch port.BehnamResolvedBehnam






















































  • No labels