WHAT WE MEAN WHEN WE SAY THAT WE ARE CORE AWARE

 

 

MINIMUM TIME TO SCHEDULE A THREAD ONTO ANOTHER CORE

 

HANDLING BLOCKING SYSCALLS

 

DATA STRUCTURE FOR TASKS THAT DO NOT YET HAVE A CORE

 

 

HOW TO INTEGRATING PRIORITIES INTO CREATE THREAD MECHANISM

 - We want the priorities to be orthogonal to the thread creation mechanism

     - Might involve 'accepting' a low-priority thread creation request without running it.

 - Since we cannot predict the priority of threads to come, and it is not

   desirable to block them from enqueuing on the fast path, it may make most

   sense to always first use a stack on the receiving core, and then yield

   immediately if there are higher priority threads.

 

 

LOGICAL CONCURRENCY VS DESIRED PARALLELISM (CORE COUNT)

 

 

TIMING GUARANTEES UNDER COOPERATIVE THREADING

 - Can we have any kind of timing guarantees without preemption?

 - If app helps, we can bound the tail latency scheduling contribution to the largest time between yield?

 - Assuming sufficiently few deadlined threads relative to the number of cores,

   an application can bound the maximum time before a deadlined thread gets to

   run by the longest non-yielding run time of a non-deadlined thread.

 - How expensive is a yield call / especially a no-op yield call?

     - Hope to make it a few ns.

 

PRIORITIES

 - Current thought is that two priorities are sufficient, but it is possible more are needed.

    - One priority for highly latency-sensitive, but short-running user threads such as pings

    - Another priority for ordinary cases.

 - How do we handle starvation?

 - Is it necessary to have arbitrary amounts of priority?

 - What are the performance implications of having multiple priority levels?

     - If checking multiple run queues, then cost will increase w / number of priorities.

 

 

HOW TO MAKE SCHEDULER VERY LOW LATENCY

 

BENEFITS / SELLING POINTS / CONTRIBUTIONS

 0. Making more resource information available at higher levels of software,

    and letting higher level leverage that to enable better performance.

 1. Having many more threads than we could afford in the kernel

        ==> avoid deadlocks based on thread limitations.

 2. Ability to run to completion (no preemption at weird times)

 3. Fast context switches

      ==> Reuse of idle times without paying for kernel context switch

      ==> Allows us to get high throughput without giving up low latency

 4. Policies for relating the number of cores to logical concurrency (number of user threads)

    - Enable us to fill in the idle time without adding extra latency

 

 

PRIORITIES OF EXISTING THREADS VS THREADS WITHOUT A CORE

  - Orthogonal issues: Goal is to run the highest priority thread.

  - Current Thought: we must treat them equally, or prioritize just-created threads to avoid deadlock.

 

HOW TO PREVENT STARVATION

    - In a priority system, do we want to starve the low priority threads?

        - Consider specific actual or expected use cases.

 - Idea 1: Ensure that the number of CPU-bound high priority threads is lower than the number of cores.

 - Idea 2: Ensure there is at least one core to run only low priority threads, even if that means putting more than one high-priority thread on the same core.

 

 

LOAD BALANCING BETWEEN CORES

    If tasks are sufficiently short, do we need to explicitly load rebalance at all between cores?

        - Could we do it based on proper assignment only?

    Which core should we assign the new thread to?

    When everyone is busy, where do we put the new thread?

    How many cores could you support with a central queue?

        - What is the throughput of a centralized scheduler queue?

        - Everyone is going to work-stealing today.

        Central queue - can get first available core, but might suffer from contention.

    Ideally, we would like to assign to the least loaded core, by number of runnable threads, but how might we obtain this information without incuring additional cache coherency traffic?

    How many cores are we trying to support?

        - Hardware is moving quickly to greater core counts

        - Suppose you wanted 100 cores and each finishing a task every us, then

          you have to do one every 10 ns, so a centralized scheduler queue

          probably won't do that.

 

 

USER API

KERNEL API