Updating Mellanox NIC Firmware

In their default configuration, the Mellanox Connext2 NICs enforce a lower limit on timeouts (specifically, the IBV_QP_TIMEOUT option). With this limit, you cannot set a timeout value less than about 500ms. Combined with the default setting of 7 retries, this means that after a timeout (e.g., a crashed server) the transmit buffer is held by the NIC for about 4 seconds before it is returned with an error. This can cause RAMCloud to run out of transmit buffers. To fix the problem, we modified the firmware in our NICs.  Here is how we did it:

  • Get from Mellanox the appropriate version of the firmware to start with. For our ConnectX2 NICs (as of December 2012) this file is in ~ouster/samba/mellanox/fw-ConnectX2-rel.mlx.
  • This file needs to be combined with an appropriate .ini file.  First, fetch the existing .ini file from the NIC:
        flint -d /dev/mst/mt26428_pci_cr0 dc > MT_0DD0120009.ini

    Check /dev/mst to verify the file name there.  In this case the .ini file is named after the board_id printed by ibv_devinfo (we have two different board ids, which have different .ini files (though they use the same .mlx file).

  • Edit the .ini file to add a new qp_minimal_timeout_val parameter with a value of zero. It goes in the HCA section, like this:

    [HCA]
    hca_header_device_id = 0x673c
    hca_header_subsystem_id = 0x0018
    dpdp_en = true
    eth_xfi_en = true
    mdio_en_port1 = 0
    qp_minimal_timeout_val = 0
  • Generate a new image from the .mlx file and the .ini file:
        mlxburn -fw fw-ConnectX2-rel.mlx -conf MT_0DD0120009.ini -wrimage MT_0DD0120009.bin
  • Upload the image into the NIC:
        flint -d /dev/mst/mt26428_pci_cr0 -i MT_0DD0120009.bin -y  b
  • Beware: if different NICs have different board ids, they will need different .ini files, and they may need different .mlx files (ask Mellanox for help).