To detect and respond machine failures, we need to understand how the networks we have (IP/Ethernet, Infiniband, RoCE) respond to missing hosts.

Infiniband RC (reliable connected QPs)

Infiniband UD (unreliable datagram QPs)