Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Strange bandwidth issues:  ib_send_bw tests on second rack (rc41-rc80 with ConnectX-3 nics and 56Gbps SwitchX switches) only show ~1650MB/s between nodes.
    1. Bandwidths are apparently fine (~3200MB/s) between rc01-rc40, which are on the th old switches (though they route through the new switches)
    2. Strangely, bandwidths between nodes in first rack and second rack are fine (~3200MB/s)
      1. So it appears to be a combination of new nics talking to new nics!?

...

  • Update SSD firmware on all drives to version 0309 (http://www.crucial.com/support/firmware.aspx). Otherwise they'll start crashing after being up for >= 5184 hours.
    1. Perhaps easiest to hexedit the bootable updater's script to flash without any interaction and PXE boot the update on all machines?Done. Modified the boot2880.img's autoexec.bat in the above iso to run the following commands at boot:
      1. sleeps appeared necessary. x8sit's shutdown.com didn't work for some reason, so the machines should be poked via ipmi to reset them after they've had enough time to finish.

        Code Block
        sleep 3
         echo yes | dosmcli.exe --bus ALL -f fwa.img -u 0 --segmented 10
        sleep 3 
        echo yes | dosmcli.exe --bus ALL -f fwa.img -u 1 --segmented 10
  • Update BIOS versions to 1.2 on all 80 machines (hoping this helps with the ConnectX-3 HCAs showing up as x4 or x2 devices, rather than x8)
    • Done. Now all cards show up as PCIe 1.0, rather than 2.0!