Page Comparison

Versions Compared

Old Version 3

changes.mady.by.user Henry Qin

Saved on Aug 26, 2016

compared with

New Version Current

changes.mady.by.user Henry Qin

Saved on Aug 26, 2016

Key

This line was added.
This line was removed.
Formatting was changed.

...

Cache coherency traffic & miss overhead since queues are shared.

...

Merge Active Queue and TaskBox Array with Randomization

Data Structure

The current active list of UserContext's with TaskBox embedded inside each one.
Furthermore, the active list for each core will start with M empty contexts and M stacks, where M is the maximum number of user threads we will allow per core.
For each core, a single word of memory will be allocated for storing both a bit mask of occupied flags on that core and a count of how many threads are running on that core.

Creator

Pick two cores at random.
Read the thread count for each one and identify the core with the lower thread count.
Find an unoccupied bit in the bit mask, and create modify a local copy with the count incremented and the bit set.
Attempt to CAS the local copy into the core's version.
Fill in the TaskBox of the appropriate slot.
Set the wakeup flag on that slot.

Idle Core

If wakeup is set in the current core, then return from the blocking state.
Otherwise, keep scanning the array of possibly active threads until we find one with a wakeup flag that is set.

Advantages

Unified mechanism for new threads and old reduces the complexity.
We get reasonable load-balancing built in at the cost of roughly 4 cache misses.

Potential Issues

Pre-allocating a bunch of stacks is memory-intensive, and memory usage is not proportional to workload.
Cores lose the ability to differentiate between new thread creations and existing threads.
- Harder to prioritize existing work over new work.
Size of the list that must be searched for runnable threads in the unloaded case is larger than it needs to be.