See the comment for a trace of the deadlock.
Added a new StartStage. New worker threads begin in the StartStage.
Once a thread is ready to do work, it moves away from the StartStage,
and no thread will ever transition back to it.
A thread that blocks waiting on another thread that is processing
the same key will block while in the StartStage. That other thread
will never switch back to the StartStage, and so the deadlock is avoided.