2/10 Meeting - Conversation about Actors

Similarities to Actors

Prescriptive, constrained approach (even more so)
Sacrifice some efficiency for ease of reasoning
Rules -> Message Handlers (Behaviors)
Tasks -> Actors
Pools -> Pools (called "Frameworks" in some Actor systems)
Both use messages to notify of failures

Differences to Actors

Constrained so one message is handled per Actor at a time
Shared state is disallowed, except when manipulated explicitly via messages
Actors react to messages, not state
- Could convert messages to state, then fire a message to induce rule recheck
Problem: often want to trigger rules when no state updates (messages) are left
- May be solved by doing everything with actors?
Priority inversion issues on messages

Sound Bytes

Rules-based code is internally synchronous, but the system is externally asynchronous. Would like a synchronous world, so we convert the async world into something synchronous. But, there are limits, which is why rules have to stay small.
Actors: everything is asynchronous all the time; can't even get safe access to shared state.
Question that arose: what's really the best case for Actor performance with all of the message passing?
- Is there an efficient Actor implementation?
- stutsman: Probably worth looking at the ('06?) Scala actor paper.
TODO(stutsman): Read Needham, Lauer duality paper
TODO(stutsman): Read up a bit on Actors; Hewitt papers of 60s, 70s
- More recent canonical discussion seems to be Agha dissertation
- Looked through that quite a bit pulled a few interesting bits out of it (see below)

Fragments from Agha Dissertation

Several observations are in order here. Firstly, the behavior of an actor can be history sensitive. Secondly, there is no presumed sequentiality in the actions an actor performs since, mathematically, each of its actions is a function of the actor's behavior and the incoming communication.
The shared variables approach does not provide any mechanism for abstraction and information hiding.. For instance, there must be predetermined protocols so tat one process can determine if another has written the results 'it needs into the relevant variables.
Buffered asynchronous communication affords us efficiency 'in execution by pipelining the actions to be performed.

1/27 Meeting

TODO

Work up an outline of the paper; send around (stutsman)
Gather data
- Fix TableManager in RAMCloud (stutsman)
- Redo HDFS
  - Block replication management on NameNode
  - And/Or DataNode to DataNode block replication
- Does Sparrow have DCFT? Could parts of it be reworked?
- Need a strong non-RAMCloud example
  - One possibility: a system that already (implicitly) uses rules.
  - Another possibility: show a system that's a mess and show how rules clean it up.

Sound Bytes

Think as rules rather than as algorithms
- What is the difference to thinking in terms of threads or in terms of events?
Claim: we don't believe a problem this hard/diverse has a one-size-fits-all solution; hence, we've tried to extract the core concepts. Implementation of the concepts will look different under contexts and requirements. The power is in the fact that the same approach can be applied to all fault-tolerant code regardless of platform, programming language, network/protocols, or even the specific types of faults that occur.
Position strongly as a concept/experience paper
- Probably want the word "Experience" in the title like the monitors paper
- "Experiences with Rules-driven Programming for Fault-tolerant Systems" etc.
Claim: several modules in RAMCloud were attempted in "traditional" threaded style, and they ended up needing to be reworked. Often, there was a tendency to want to write really simple looping, blocking, or locking code somewhere deep in places that looked like it didn't matter (success seemed guaranteed, locks seems short, etc). These were a constant thorn. We should try to list as many as we can think of.
- BackupSelector was an offender on many occasions. Each time we'd make it a little better, without going the whole way, and each time we'd have to come back and fix it up.
  - One of the constant temptations was to "spin" in the selector waiting for new backups to get added to the list. But this prevented higher-level error handling cases from kicking in.
- TableManager
- BackupRecoveryManager
- ServerList management
- (add more as we think of them)
Recomputing what to do next from scratch is the key to making things easier to reason about; relying in the PC is efficient, but it's burdened with assumptions that make it difficult to use effectively.
- The rules-based approach unapologetically less efficient.
- On the other hand, we've used it to great effect in our high-performance system.
Rules become really critical when failure-handling may involve failure-handling. Even more so, when failure handling may be recursive, mutually-recursive, or iterative.

DCFT Paper Notes

2/10 Meeting - Conversation about Actors

Similarities to Actors

Differences to Actors

Sound Bytes

Fragments from Agha Dissertation

1/27 Meeting

TODO

Sound Bytes