An Analysis of User-space Idle State Instructions on x86 Processors
Guilt-free busy waiting
An Analysis of User-space Idle State Instructions on x86 Processors Malte-Christian Kuns, Hannes Tröpgen, and Robert Schöne ICPE'25
I’ve long believed that busy waiting is poor form. The closest thing you should ever come to busy waiting is to lock a futex/CRITICAL_SECTION, which will busy wait for a short while on your behalf.
If your primary concern is power consumption, then busy waiting may be less offensive on a modern processor. This paper describes newly added x86 instructions to enable low power busy waiting from user space, and has a ton of data to help you sleep better at night.
New Instructions
TPAUSE puts the processor into a low power state for a user-specified amount of time. TPAUSE supports two low power states (C0.1 and C0.2), which trade power consumption for wake-up latency. TPAUSE can be called in user space but doesn’t wrest control of the core away from the OS. The trick is the OS can set a maximum timeout value, which gives the OS a chance to switch away from the busy waiting thread.
UMONITOR and UMWAIT instructions are similar to TPAUSE but allow the processor to be woken up when a write occurs in a specified memory range. UMONITOR sets up the memory range to be monitored, and UMWAIT causes the processor to enter a low power state. UMWAIT accepts a timeout value and a target power state (just like TPAUSE). AMD supports similar functionality via the MONITORX and MWAITX instructions.
Results
A key question the paper investigates is how closely the user-specified timeout is honored. Fig. 1 shows results for three Intel cores:
Times are measured in timestamp counter cycles (roughly 3 GHz for Alder Lake). The plateau at the top right is caused by the OS-specified maximum timeout. The authors find that timeout values are quantized (e.g., 83 cycles on Alder Lake P-core). Additionally, for short timeouts the processor may ignore the user-requested power state (presumably because it doesn’t make sense to enter a deep sleep for a short amount of time). On Alder Lake P-cores, the threshold below which the processor will not enter the lowest power state is around 23,000 TSC cycles. Alder Lake E-cores seem to only support one low power state.
Fig. 3 measures how much the processor can “oversleep” (wake up later than requested) depending on processor frequency and requested power state:
And finally, table 2 shows measured power consumption for these new instructions vs old-fashioned busy wait loops that uses the PAUSE instruction (which does not support a user-specified timeout):
I’m shocked by the advantage that AMD has here. If CPU core power during busy waiting is your primary concern, then you should choose your chip carefully.
Dangling Pointers
Condition variables are a general and useful abstraction. It would be nice if code that used condition variables could automatically benefit from these instructions. Maybe some compiler and/or hardware assistance is necessary to enable that.




