Put the a worker thread and dispatch thread on the same core to improve latency in the unloaded case.
Since workers now block instead of poll, this may be safe.
Experiment with different implementations of thread local storage.