I've noticed in Jonathan's test case that there are often large groups of txHintFailed RPCs issued at almost the same time. It appears to me that if a server has a large number of PreparedOps for a transaction, then it will issue one txHintFailed RPC for each PreparedOp? This is causing significant congestion in the RPC system; with InfRcTransport, this is leading to deadlock where one host sends a large burst of txHintFailed RPCs to itself , using up all of the transmit buffers, and it loops waiting for transmit buffers, which prevents it from getting back into the poll loop to read incoming messages and free up buffers.
I'm going to see if I can fix the deadlock problem, but in any case, sending multiple txHintFailed RPCs seems like unnecessary work. Is there an overall state record for each transaction on each participant server? If so, how about using that object to trigger txHintFailed requests, rather than the PreparedOps, so that there is only one request per transaction? Or, how about only starting the WorkerTimer for the first PreparedOp for each transaction?