Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Cache coherency traffic & miss overhead since queues are shared.

 

 

 

...

Merge Active Queue and TaskBox Array with Randomization

Data Structure

  • The current active list of UserContext's with TaskBox embedded inside each one.
  • Furthermore, the active list for each core will start with M empty contexts and M stacks, where M is the maximum number of user threads we will allow per core.
  • For each core, a single word of memory will be allocated for storing both a bit mask of occupied flags on that core and a count of how many threads are running on that core.

Creator

  1. Pick two cores at random.
  2. Read the thread count for each one and identify the core with the lower thread count.
  3. Find an unoccupied bit in the bit mask, and create modify a local copy with the count incremented and the bit set.
  4. Attempt to CAS the local copy into the core's version.
  5. Fill in the TaskBox of the appropriate slot.
  6. Set the wakeup flag on that slot.

Idle Core

  1. If wakeup is set in the current core, then return from the blocking state.
  2. Otherwise, keep scanning the array of possibly active threads until we find one with a wakeup flag that is set.

Advantages

  • Unified mechanism for new threads and old reduces the complexity.
  • We get reasonable load-balancing built in at the cost of roughly 4 cache misses.

Potential Issues

  • Pre-allocating a bunch of stacks is memory-intensive, and memory usage is not proportional to workload.
  • Cores lose the ability to differentiate between new thread creations and existing threads.
    • Harder to prioritize existing work over new work.
  • Size of the list that must be searched for runnable threads in the unloaded case is larger than it needs to be.