-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time.monotonic() should use a different clock source on Windows #88494
Comments
Related to https://bugs.python.org/issue41299#msg395220 Presumably
I've run into issues with this when porting python-based applications to Windows. On other platforms, time.monotonic() was a decent precision so I used it. When I ported to Windows, I had to replace all of my time.monotonic() calls with time.perf_counter(). I would pretty much never knowingly call time.monotonic() if I knew ahead of time it could be quantized to 16ms. My opinion is that the GetTickCount64() monotonic time code in CPython should be removed entirely and only the QueryPerformanceCounter() path should be used. I also think some of the failure checks could be removed from QueryPerformanceCounter() / QueryPerformanceFrequency(), as they're documented to never fail in modern Windows and CPython has been dropping support for older versions of Windows, but that's less of a firm opinion. |
I found these two references:
Which suggest QueryPerformanceCounter() may be bad because it can drift. However, these posts are fairly old and the StackOverflow post also says the drift is small on newer hardware / Windows. Microsoft's current stance is that QueryPerformanceCounter() is good: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps
I looked into how a few other languages provide monotonic time on Windows: Golang seems to read the interrupt time (presumably equivalent to QueryInterruptTime) directly by address. https://github.com/golang/go/blob/a3868028ac8470d1ab7782614707bb90925e7fe3/src/runtime/sys_windows_amd64.s#L499 Rust uses QueryPerformanceCounter: https://github.com/rust-lang/rust/blob/38ec87c1885c62ed8c66320ad24c7e535535e4bd/library/std/src/time.rs#L91 V8 uses QueryPerformanceCounter after checking for old CPUs: https://github.com/v8/v8/blob/dc712da548c7fb433caed56af9a021d964952728/src/base/platform/time.cc#L672 Ruby uses QueryPerformanceCounter: https://github.com/ruby/ruby/blob/44cff500a0ad565952e84935bc98523c36a91b06/win32/win32.c#L4712 C# implements QueryPerformanceCounter on other platforms using CLOCK_MONOTONIC, indicating that they should be roughly equivalent: https://github.com/dotnet/runtime/blob/01b7e73cd378145264a7cb7a09365b41ed42b240/src/coreclr/pal/src/misc/time.cpp#L175 Swift originally used QueryPerformanceCounter, but switched to QueryUnbiasedInterruptTime() because they didn't want to count time the system spent asleep: swiftlang/swift-corelibs-libdispatch@766d647 ------ Note that none of these languages use GetTickCount64(). Swift is an interesting counter point, and I noticed QueryUnbiasedInterruptTime() is available on Windows 8 while QueryInterruptTime() is new as of Windows 10. The "Unbiased" just refers to whether it advances during sleep. I'm not actually sure whether time.monotonic() in Python counts time spent asleep, or whether that's desirable. Some kinds of timers using monotonic time should definitely freeze during sleep so they don't cause a flurry of activity on wake, but others definitely need to roughly track wall clock time, even during sleep. Perhaps the long term answer would be to introduce separate "asleep" and "awake" monotonic clocks in Python, and possibly deprecate perf_counter() if it's redundant after this (as I think it's aliased to monotonic() on non-Windows platforms anyway). |
You resolved bpo-41299 using QueryPerformanceCounter(), so we're already a step toward making it the default monotonic clock. Personally, I've only relied on QPC for short intervals, but, as you've highlighted above, other language runtimes use it for their monotonic clock. Since Vista, it's apparently more reliable in terms of calibration and ensuring that a processor TSC is only used if it's known to be invariant and constant. That said, Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid solution. It uses the performance counter to interpolate a timestamp between interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if it's available via GetProcAddress(). QueryInterruptTimePrecise() is about 1.38 times the cost of QPC (on average across 100 million calls). Both functions are significantly more expensive than QueryInterruptTime() and GetTickCount64(), which simply return a value that's read from shared memory (i.e. the KUSER_SHARED_DATA structure).
QueryInterruptTime() and QueryUnbiasedInterruptTime() don't provide high-resolution timestamps. They're updated by the system timer interrupt service routine, which defaults to 64 interrupts/second. The time increment depends on when the counter is read by the ISR, but it averages out to approximately the interrupt period (e.g. 15.625 ms).
POSIX doesn't specify whether CLOCK_MONOTONIC [1] should include the time that elapses while the system is in standby mode. In Linux, CLOCK_BOOTTIME includes this time, and CLOCK_MONOTONIC excludes it. Windows QueryUnbiasedInterruptTimePrecise excludes it.
Both may not be supportable on all platforms, but they're supported in Linux, Windows 10, and macOS. The latter has mach_continuous_time(), which includes the time in standby mode, and mach_absolute_time(), which excludes it. [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/clock_gettime.html |
Great information, thanks!
My personal vote is to use the currently most common clock source (QPC) for now for monotonic(), because it's the same across Windows versions and the most likely to produce portable monotonic timestamps between apps/languages on the same system. It's also the easiest patch, as there's already a code path for QPC. (As someone building multi-app experiences around Python, I don't want to check the Windows version to see which time base Python is using. I'd feel better about switching to QITP() if/when Python drops Windows 8 support.) A later extension of this idea (maybe behind a PEP) could be to survey the existing timers available on each platform and consider whether it's worth extending |
Changing is clock is a tricky. There are many things to consider:
-- When I designed PEP-418 (in 2012), QueryPerformanceCounter() was not reliable: "It has a much higher resolution, but has lower long term precision than GetTickCount() and timeGetTime() clocks. For example, it will drift compared to the low precision clocks." And there were a few bugs like: "The performance counter value may unexpectedly leap forward because of a hardware bug". A Microsoft blog article explains that users wanting a steady clock with precision higher than GetTickCount() should interpolate GetTickCount() using QueryPerformanceCounter(). If I recall correctly, this is what Firefox did for instance. Eryk: "That said, Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid solution. It uses the performance counter to interpolate a timestamp between interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if it's available via GetProcAddress()." Oh, good that they provided an implementation for that :-) --
It uses CPUID to check for "non stoppable time stamp counter": // Check if CPU has non stoppable time stamp counter.
const unsigned parameter_containing_non_stop_time_stamp_counter = 0x80000007;
if (num_ext_ids >= parameter_containing_non_stop_time_stamp_counter) {
__cpuid(cpu_info, parameter_containing_non_stop_time_stamp_counter);
has_non_stop_time_stamp_counter_ = (cpu_info[3] & (1 << 8)) != 0;
} Maybe we use such check in Python: use GetTickCount() on old CPUs, or QueryPerformanceCounter() otherwise. MSVC provides the __cpuid() function: --
Oh, I recall that it was a tricky question. The PEP-418 simply says: See "Include Sleep" and "Include Suspend" columns of my table: |
I think a lot of that is based on very outdated information. It's worth reading this article: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps I will repeat Microsoft's current recommendation (from that article):
(Based on that, it may also be worth replacing time.time()'s GetSystemTimeAsFileTime with GetSystemTimePreciseAsFileTime in CPython, as GetSystemTimePreciseAsFileTime is available in Windows 8 and newer)
Microsoft on drift (from the article above):
Modern Windows also automatically detects and works around stoppable TSC, as well as several other issues:
It seems like Microsoft considers QPC to be a significantly better time source now, than when PEP-418 was written. Another related conversation is whether Python can just expose all of the Windows clocks directly (through clock_gettime enums?), as that gives anyone who really wants full control over their timestamps a good escape hatch. |
Technically, it remains possible to install Python on Windows 7, see: bpo-32592. |
On second thought, starting with Windows 8, WaitForSingleObject() and WaitForMultipleObjects() exclude time when the system is suspended. For consistency, an external deadline (e.g. for SIGINT support) should work the same way. The monotonic clock should thus be based on QueryUnbiasedInterruptTime(). We can conditionally use QueryUnbiasedInterruptTimePrecise() in Windows 10, which I presume includes most users of Python 3.9+ on Windows since Windows 8.1 only has a 3% share of desktop/laptop systems. If we can agree on the above, then the change to use QueryPerformanceCounter() to resolve bpo-41299 should be reverted. The deadline should instead be computed with QueryUnbiasedInterruptTime(). It's limited to the resolution of the system interrupt time, but at least compared to GetTickCount64() it returns the real interrupt time instead of an idealized 64 ticks/second.
_Py_clock_gettime() and _Py_clock_getres() could be implemented in Python/pytime.c. For Windows we could implement the following clocks:
See bpo-19007, which is nearly 8 years old. |
IMO we should check how web browsers manage time on the different platforms. They have high expectations from clocks: security, efficiency, reliability. Examples:
and:
|
My notes on Windows clocks: https://vstinner.readthedocs.io/windows.html#time |
Beware of relying on old code comments (I'm assuming that since both links in that comment above are from ~20 years ago that the code is also roughly that old). QPC in particular has improved/changed behaviour a few times since then. Personally, I'd trust anything Eryk Sun has posted in the last 3 years over code that was tweaked to perfection based on Windows XP. |
Maybe I would be ok to use QPC for time.monotonic() if we only make the change conditionnal: only on the most recent Windows version, Windows 11. It would avoid any bad surprises on very old Windows versions running on old hardware with less reliable CPU TSC clock source. |
Our earliest supported version is Windows 10 now, so we don't have to worry about really old ones. This page covers the history. Looks like things have been "fine" since Windows 8: https://learn.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps |
On Windows, time.monotonic() now uses QueryPerformanceCounter() clock instead of GetTickCount64() to have a resolution better than 1 us instead of having a resolution of 15.6 ms.
On Windows, time.monotonic() now uses the QueryPerformanceCounter() clock to have a resolution better than 1 us, instead of the gGetTickCount64() clock which has a resolution of 15.6 ms.
I created #116781 to use QueryPerformanceCounter() for time.monotonic(). |
See also issue gh-115637. |
On Windows, time.monotonic() now uses the QueryPerformanceCounter() clock to have a resolution better than 1 us, instead of the gGetTickCount64() clock which has a resolution of 15.6 ms.
On Windows, time.monotonic() now uses the QueryPerformanceCounter() clock to have a resolution better than 1 us, instead of the gGetTickCount64() clock which has a resolution of 15.6 ms.
time.monotonic() now uses QueryPerformanceCounter() (commit) and has a resolution lower than 1 us. I measured 100 ns (10 MHz) on my Windows 11 VM, whereas before it was 15.6 ms (64 Hz): the updated clock resolution is 156,000x better :-) Thanks everybody for looking into this issue and providing very useful technical details about these clocks. Follow-up: see issue gh-63207 to use GetSystemTimePreciseAsFileTime() for time.time(). |
…ython#116781) On Windows, time.monotonic() now uses the QueryPerformanceCounter() clock to have a resolution better than 1 us, instead of the gGetTickCount64() clock which has a resolution of 15.6 ms.
…ython#116781) On Windows, time.monotonic() now uses the QueryPerformanceCounter() clock to have a resolution better than 1 us, instead of the gGetTickCount64() clock which has a resolution of 15.6 ms.
…ython#116781) On Windows, time.monotonic() now uses the QueryPerformanceCounter() clock to have a resolution better than 1 us, instead of the gGetTickCount64() clock which has a resolution of 15.6 ms.
On Windows, time.monotonic() now calls QueryPerformanceCounter() which has a resolution of 100 ns. Reduce CLOCK_RES from 50 or 100 ms to 1 ms.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: