Making more resource information available at higher levels of software, and letting higher level leverage that to enable better performance. This point is a little too vague, but maybe we can make it more concrete once our implementation is working.
Having many more threads than we could afford in the kernel ==> avoid deadlocks based on thread limitations.
Ability to run tasks to completion (no preemption at awkward times)
Fast context switches ==> Reuse of idle times without paying for kernel context switch ==> Allows us to get high throughput without giving up low latency
Policies for relating the number of cores to logical concurrency (number of user threads) - Enable us to fill in the idle time without adding extra latency - Allow the application to offer concurrency that matches available cores (avoid kernel thread multiplexing)
Effective load-balancing by using very short tasks.