Core Negotiation Protocol
REQUEST/GRANT A CORE
- For each process, the Arachne module in kernel maintains a dynamically-sized kernel thread pool for each core in the system
- The pool is initially empty and threads are created as needed.
- The new kernel threads are created using syscall `clone` with the starting function set to the Arachne main loop.
- We do not need a different clone call because there is already a flag for indicating the type of thread in the existing mechanisms.
- When the kernel decides to allocate a core to a process, it takes one kernel thread from the corresponding pool and put it in the run queue
- run queue is a per-cpu per-scheduling-class data structure
Q1: Can we have just one thread pool instead?
RELEASE A CORE VOLUNTARILY
The currently running Arachne user thread yields
- long-running user thread needs to yield from time to time to make sure it doesn't miss the request from kernel to relinquish the core
- maybe we could provide a `yield_only_necessary()` function to only yield the control if `relinquishThisCore()` returns true
- Context switch back to the Arachne main loop
The main loop polls to check if it needs to call sched_yield() to relinquish the core
KERNEL TAKES BACK A CORE
- First, the kernel will ask nicely by leaving a message in the shared memory indicating that it wants N cores back.
- Arachne then decides which cores to give back to the kernel.
- If somehow an Arachne user thread does not yield in time to relinquish the core, the kernel will be forced to preempt the insubordinate thread T.
- Thread T will be downgraded to a SCHED_OTHER thread and scheduled by the CFS.
- Later when T finally gets back to the main loop, it will notice that it no longer binds to a core and then exit the main loop.
HANDLE BLOCKING SYSCALLS IN KERNEL
- Right after the current kernel T1 thread is put on a wait queue, the kernel needs to take another kernel thread T2 from the same thread pool and put it in the run queue.
- The kernel then switches to T2 and returns to the Arachne main loop.
- Once T1 is awakened, we can put T1 back in the run queue and tell T2 (by leaving a message in the shared memory) that it should step down and yield control to T1 at its convenience.
Q1: Should Arachne know about the blocked user thread?
hq6: There does not seem to be much use in Arachne being aware of the blocked user thread, since it is blocked due to a kernel operation, and there's not much Arachne could do with that information.
Q2: How do we notice the blocking syscall finishes if we have disabled the interrupts to achieve full core isolation?
hq6: It may make sense to keep device interrupts, and disable only timer interrupts.
Q3: Should we consider the possibility of async. syscall mechanism (e.g., FlexSC, Linux Syslets, etc.)?
hq6: Time permitting, we should experiment with the FlexSC idea of piling up syscalls on a remote core and measure application-level throughput and latency under load. The hidden costs of context switching between the user and kernel that FlexSC aims to reduce still exists in our scheme above.
Q4: What if we have too many threads blocking in the syscalls at the same time? Should we try to cap the size of the thread pool?
hq6: This seems like a self-limiting problem, in the sense that if the core is spending a large proportion of time in the kernel and handling interrupts due to IO completions, then it will not have much time to accept many new user threads to issue further syscalls.
Q5: Is the scheduling class going to be notified when threads are blocked on syscalls and woken?
Yes, the hook functions `dequeue_task` and `task_woken` of the `sched_class` interface (defined at http://lxr.free-electrons.com/source/kernel/sched/sched.h#L1193) will be called respectively.
SCHEDULING POLICIES IN KERNEL
Scheduling policies currently available in Linux 4.7 are defined at: http://lxr.free-electrons.com/source/include/uapi/linux/sched.h?v=4.7#L35.
The scheduling policy used by a particular thread is indicated by the `policy` field in `task_struct`: http://lxr.free-electrons.com/source/include/linux/sched.h?v=4.7#L1495.
SAMPLE ARACHNE MAIN LOOP IMPLEMENTATION
This is a simplistic Arachne main loop implementation that achieves the functionality mentioned above.
void Arachne::MainLoop() { while (1) { while (!relinquishThisCore() && !yieldToUnblockedSibling()) { // Pick the next Arachne user thread to run // and switch to its context; this function // returns when the Arachne user thread // decides to yield the control schedule_next_thread(); } // If this thread have been downgraded and // lost its binding to the core, just exit the // the main loop and terminate gracefully if (!boundToThisCore()) return; sched_yield(); } }
(DYNAMIC) FULL CORE ISOLATION
Changes to the load balancer:
- Before allocating a core, migrate as many tasks (TODO: be more specific about what tasks are moveable) as possible to non-isolated cores;
- Avoid migrating tasks to isolated cores
Move non-CPU-bound interrupts to non-isolated cores
Reduce CPU-bound interrupts as much as possible
Turn off the scheduler tick entirely
TO BE CONTINUED...