Infiniband Tools and Debugging

Checking Link Speeds

Run iblinkinfo as root. This will show link speeds of all ports in the network (both on switches and HCAs).

E.g. (rc14 or its switch has some issue causing it to run at 2.5Gbps QDR):

[root@rc14 ~]# iblinkinfo |grep \"rc
          50   15[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      51    1[  ] "rc41 HCA-1" ( )
 ...
          41   35[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      45    1[  ] "rcnfs HCA-1" ( )
          41   36[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       1    1[  ] "rcmaster HCA-1" ( )
          48   15[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      26    1[  ] "rc02 HCA-1" ( )
          48   16[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       6    1[  ] "rc22 HCA-1" ( )
          48   17[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      23    1[  ] "rc30 HCA-1" ( )
          48   18[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      30    1[  ] "rc10 HCA-1" ( )
          48   19[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      16    1[  ] "rc28 HCA-1" ( )
          48   20[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      40    1[  ] "rc06 HCA-1" ( )
          48   21[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      37    1[  ] "rc18 HCA-1" ( )
          48   22[  ] ==( 4X  2.5 Gbps Active/  LinkUp)==>      36    1[  ] "rc14 HCA-1" ( Could be 10.0 Gbps)
          48   23[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      29    1[  ] "rc16 HCA-1" ( )
...

Measuring Bandwidth

ib_send_lat will measure bandwidth between two hosts using the send/recv verbs. An example follows below.

Src host:

[root@rc14 ~]# ib_send_bw rc16ib
------------------------------------------------------------------
                    Send BW Test
 Number of qps   : 1
 Connection type : RC
 TX depth        : 300
 CQ Moderation   : 50
 Link type       : IB
 Mtu             : 2048
 Inline data is used up to 0 bytes message
 local address: LID 0x24 QPN 0x80049 PSN 0xe895fd
 remote address: LID 0x1d QPN 0x200049 PSN 0xb960d2
------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
 65536     1000           939.34             939.34
------------------------------------------------------------------

Dst host:

rumble@rc16:~$ ib_send_bw
------------------------------------------------------------------
                    Send BW Test
 Number of qps   : 1
 Connection type : RC
 RX depth        : 600
 CQ Moderation   : 50
 Link type       : IB
 Mtu             : 2048
 Inline data is used up to 0 bytes message
 local address: LID 0x1d QPN 0x200049 PSN 0xb960d2
 remote address: LID 0x24 QPN 0x80049 PSN 0xe895fd
------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
 65536     1000           -nan               940.85
------------------------------------------------------------------

Measuring Latency

Use ib_send_lat or ibv_ud_pingpong as above. Note that the two apps may have different defaults for packet sizes, inlining, etc.