...
WIP distillation of thoughts on DCFT.
Python MapReduce Notes
Interesting that my implementation also had three levels: Master/Scheduler (Job), TaskWrapper (Task), and Task (TaskAttempt). I used a "nested rules" approach but in reality it is likely that it could have been collapsed in one top level rules set. I should mention there is a great deal of similarity between a MapTask and a ReduceTask in terms of it's rules. How do you modularize Task classes? Can you subclass a Task class? The below table does not include the RPC rules. The table also does not account for the rules for the membership service (though that likely should be a different module).
Module | Rule Count | Comments |
---|---|---|
Master/Scheduler | 1 | Basically just the rule to prevent deadlock by preempting tasks |
MapTaskSet | 2 | For straggler reissue. |
MapTask/Wapper | 5 | |
ReduceTaskSet | 0 | Just pool management. |
ReduceTask | 4 | |
Total | 12 | Server failure "event" rule is not included (i.e. the handle that sets the isAlive bit is not counted as a rule). "Event" rules for setting the status of an RPC are also not included. The RPC status is considered a state field. |
Hadoop MapReduce Walkthrough
...
Hadoop MapReduce State Machine Redundancy
StateMachine | Total Transitions | Distinct Transitions | # Duplicate / # Distinct |
---|---|---|---|
JobImpl | 82 | 27 | 50/7 |
TaskImpl | 24 | 16 | 6/3 |
TaskAttemptImpl | 57 | 15 | 41/8 |
Total | 163 | 58 | 97/18 |
JobImpl
Count | Transition | Trigger (Event/State) | Comment |
---|---|---|---|
12 | DIAGNOSTIC_UPDATE_TRANSITION | JobEventType.JOB_DIAGNOSTIC_UPDATE | Same for most named states (JobImpl) |
14 | COUNTER_UPDATE_TRANSITION | JobEventType.JOB_COUNTER_UPDATE | |
13 | INTERNAL_ERROR_TRANSITION | JobEventType.INTERNAL_ERROR | InternalErrorTransition extends InternalTerminationTransition by setting the error into the history string. |
5 | INTERNAL_REBOOT_TRANSITION | JobEventType.JOB_AM_REBOOT | InternalRebootTransition extends InternalTerminationTransition by setting the error in the history string. |
2 | TASK_ATTEMPT_COMPLETED_EVENT_TRANSITION | JobEventType.JOB_TASK_ATTEMPT_COMPLETED | |
2 | KilledDuringAbortTransition() | JobEventType.JOB_KILL | From FAIL_WAIT and FAIL_ABORT |
2 | JobAbortCompletedTransition() | JobEventType.JOB_ABORT_COMPLETED | From FAIL_ABORT and KILL_ABORT |
...