Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Server rc03 disappeared during the coordinator recovery in point 4. This is clear in the server list received by rc05, and was also indicated in point 4 when servers rc01 and rc04 received (old / redundant) serverlist updates, but rc03 didn't. To confirm, i started another master rc06. This server was assigned server id 3 (reused from rc03 going out of the cluster). Why?

Working hypothesis: crashing rc02 appends a serverdown entry to logcabin. Till ongaro implements a cleaner in logcabin, i have a workaround method readValidEntries() in LogCabinHelper that cleans up all the entries read to remove the invalidated entries. This method assumes that the order number of any particular entry corresponds to the entry id. This is probably not true.

...