...
WIP distillation of thoughts on DCFT.
Python MapReduce Notes
Interesting that my implementation also had three levels: Master/Scheduler (Job), TaskWrapper (Task), and Task (TaskAttempt). I used a "nested rules" approach but in reality it is likely that it could have been collapsed in one top level rules set. I should mention there is a great deal of similarity between a MapTask and a ReduceTask in terms of it's rules. How do you modularize Task classes? Can you subclass a Task class? The below table does not include the RPC rules. The table also does not account for the rules for the membership service (though that likely should be a different module).
Module | Rule Count | Comments |
---|---|---|
Master/Scheduler | 1 | Basically just the rule to prevent deadlock by preempting tasks |
MapTaskSet | 2 | For straggler reissue. |
MapTask/Wapper | 5 | |
ReduceTaskSet | 0 | Just pool management. |
ReduceTask | 4 | |
Total | 12 | Server failure "event" rule is not included (i.e. the handle that sets the isAlive bit is not counted as a rule). "Event" rules for setting the status of an RPC are also not included. The RPC status is considered a state field. |
Hadoop MapReduce Walkthrough
...