...
- Can we have any kind of timing guarantees without preemption?
- If app helps, we can bound the tail latency scheduling contribution to the largest time between yield?
- Assuming sufficiently few deadlined threads relative to the number of cores, an application can bound the maximum time before a deadlined thread gets to run by the longest non-yielding run time of a non-deadlined thread.
- How expensive is a yield call / especially a no-op yield call?
- Hope to make it a few ns.
PRIORITIES -
- Current thought is that two priorities are sufficient, but it is possible more are needed.
...
- One priority for highly latency-sensitive, but short-running user threads such as pings
...
- Another priority for ordinary cases.
...
- How do we handle starvation?
...
- Is it necessary to have arbitrary amounts of priority?
...
- What are the performance implications of having multiple priority levels?
...
- If checking multiple run queues, then cost will increase w / number of priorities.
HOW TO MAKE SCHEDULER VERY LOW LATENCY
...
PRIORITIES OF EXISTING THREADS VS THREADS WITHOUT A CORE -
- Orthogonal issues: Goal is to run the highest priority thread.
...
- Current Thought: we must treat them equally, or prioritize just-created threads to avoid deadlock.
HOW TO PREVENT STARVATION - In
- In a priority system, do we want to starve the low priority threads?
...
- Consider specific actual or expected use cases.
...
- Idea 1: Ensure that the number of CPU-bound high priority threads is lower than the number of cores.
...
- Idea 2: Ensure there is at least one core to run only low priority threads, even if that means putting more than one high-priority thread on the same core.
LOAD BALANCING BETWEEN CORES
- If tasks are sufficiently short, do we need to explicitly load rebalance at all between cores?
- - Could we do it based on proper assignment only?
- Which core should we assign the new thread to?
- When everyone is busy, where do we put the new thread?
- How many cores could you support with a central queue?
- - What is the throughput of a centralized scheduler queue?
- - Everyone is going to work-stealing today.
- Central queue - can get first available core, but might suffer from contention.
- Ideally, we would like to assign to the least loaded core, by number of runnable threads, but how might we obtain this information without incuring additional cache coherency traffic?
- How many cores are we trying to support?
- - Hardware is moving quickly to greater core counts
- - Suppose you wanted 100 cores and each finishing a task every us, then
- you have to do one every 10 ns, so a centralized scheduler queue
- probably won't do that.
USER API
KERNEL API