Infiniband Tools and Debugging
Checking Link Speeds
Run iblinkinfo as root. This will show link speeds of all ports in the network (both on switches and HCAs).
E.g. (rc14 or its switch has some issue causing it to run at 2.5Gbps QDR):
[root@rc14 ~]# iblinkinfo |grep \"rc 50 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 51 1[ ] "rc41 HCA-1" ( ) ... 41 35[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 45 1[ ] "rcnfs HCA-1" ( ) 41 36[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 1 1[ ] "rcmaster HCA-1" ( ) 48 15[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 26 1[ ] "rc02 HCA-1" ( ) 48 16[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 6 1[ ] "rc22 HCA-1" ( ) 48 17[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 23 1[ ] "rc30 HCA-1" ( ) 48 18[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 30 1[ ] "rc10 HCA-1" ( ) 48 19[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 16 1[ ] "rc28 HCA-1" ( ) 48 20[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 40 1[ ] "rc06 HCA-1" ( ) 48 21[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 37 1[ ] "rc18 HCA-1" ( ) 48 22[ ] ==( 4X 2.5 Gbps Active/ LinkUp)==> 36 1[ ] "rc14 HCA-1" ( Could be 10.0 Gbps) 48 23[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 29 1[ ] "rc16 HCA-1" ( ) ...
Measuring Bandwidth
ib_send_lat will measure bandwidth between two hosts using the send/recv verbs. An example follows below.
Src host:
[root@rc14 ~]# ib_send_bw rc16ib ------------------------------------------------------------------ Send BW Test Number of qps : 1 Connection type : RC TX depth : 300 CQ Moderation : 50 Link type : IB Mtu : 2048 Inline data is used up to 0 bytes message local address: LID 0x24 QPN 0x80049 PSN 0xe895fd remote address: LID 0x1d QPN 0x200049 PSN 0xb960d2 ------------------------------------------------------------------ #bytes #iterations BW peak[MB/sec] BW average[MB/sec] 65536 1000 939.34 939.34 ------------------------------------------------------------------
Dst host:
rumble@rc16:~$ ib_send_bw ------------------------------------------------------------------ Send BW Test Number of qps : 1 Connection type : RC RX depth : 600 CQ Moderation : 50 Link type : IB Mtu : 2048 Inline data is used up to 0 bytes message local address: LID 0x1d QPN 0x200049 PSN 0xb960d2 remote address: LID 0x24 QPN 0x80049 PSN 0xe895fd ------------------------------------------------------------------ #bytes #iterations BW peak[MB/sec] BW average[MB/sec] 65536 1000 -nan 940.85 ------------------------------------------------------------------
Measuring Latency
Use ib_send_lat or ibv_ud_pingpong as above. Note that the two apps may have different defaults for packet sizes, inlining, etc.