Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time.monotonic() should use a different clock source on Windows #88494

Closed
lunixbochs mannequin opened this issue Jun 6, 2021 · 20 comments
Closed

time.monotonic() should use a different clock source on Windows #88494

lunixbochs mannequin opened this issue Jun 6, 2021 · 20 comments
Labels
3.12 bugs and security fixes OS-windows performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@lunixbochs
Copy link
Mannequin

lunixbochs mannequin commented Jun 6, 2021

BPO 44328
Nosy @pfmoore, @abalkin, @vstinner, @tjguk, @zware, @eryksun, @zooba, @pganssle, @lunixbochs

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-06-06.22:05:07.766>
labels = ['3.11', 'library', 'OS-windows', 'performance']
title = 'time.monotonic() should use a different clock source on Windows'
updated_at = <Date 2021-06-14.21:10:46.833>
user = 'https://github.com/lunixbochs'

bugs.python.org fields:

activity = <Date 2021-06-14.21:10:46.833>
actor = 'eryksun'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)', 'Windows']
creation = <Date 2021-06-06.22:05:07.766>
creator = 'lunixbochs2'
dependencies = []
files = []
hgrepos = []
issue_num = 44328
keywords = []
message_count = 12.0
messages = ['395221', '395238', '395490', '395493', '395681', '395683', '395719', '395769', '395771', '395782', '395784', '395849']
nosy_count = 9.0
nosy_names = ['paul.moore', 'belopolsky', 'vstinner', 'tim.golden', 'zach.ware', 'eryksun', 'steve.dower', 'p-ganssle', 'lunixbochs2']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue44328'
versions = ['Python 3.11']

Linked PRs

@lunixbochs
Copy link
Mannequin Author

lunixbochs mannequin commented Jun 6, 2021

Related to https://bugs.python.org/issue41299#msg395220

Presumably time.monotonic() on Windows historically used GetTickCount64() because QueryPerformanceCounter() could fail. However, that hasn't been the case since Windows XP: https://docs.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter

On systems that run Windows XP or later, the function will always succeed and will thus never return zero

I've run into issues with this when porting python-based applications to Windows. On other platforms, time.monotonic() was a decent precision so I used it. When I ported to Windows, I had to replace all of my time.monotonic() calls with time.perf_counter(). I would pretty much never knowingly call time.monotonic() if I knew ahead of time it could be quantized to 16ms.

My opinion is that the GetTickCount64() monotonic time code in CPython should be removed entirely and only the QueryPerformanceCounter() path should be used.

I also think some of the failure checks could be removed from QueryPerformanceCounter() / QueryPerformanceFrequency(), as they're documented to never fail in modern Windows and CPython has been dropping support for older versions of Windows, but that's less of a firm opinion.

@lunixbochs lunixbochs mannequin added 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir OS-windows performance Performance or resource usage labels Jun 6, 2021
@lunixbochs
Copy link
Mannequin Author

lunixbochs mannequin commented Jun 7, 2021

I found these two references:

Which suggest QueryPerformanceCounter() may be bad because it can drift. However, these posts are fairly old and the StackOverflow post also says the drift is small on newer hardware / Windows.

Microsoft's current stance is that QueryPerformanceCounter() is good: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps

Guidance for acquiring time stamps
Windows has and will continue to invest in providing a reliable and efficient performance counter. When you need time stamps with a resolution of 1 microsecond or better and you don't need the time stamps to be synchronized to an external time reference, choose QueryPerformanceCounter

I looked into how a few other languages provide monotonic time on Windows:

Golang seems to read the interrupt time (presumably equivalent to QueryInterruptTime) directly by address. https://github.com/golang/go/blob/a3868028ac8470d1ab7782614707bb90925e7fe3/src/runtime/sys_windows_amd64.s#L499

Rust uses QueryPerformanceCounter: https://github.com/rust-lang/rust/blob/38ec87c1885c62ed8c66320ad24c7e535535e4bd/library/std/src/time.rs#L91

V8 uses QueryPerformanceCounter after checking for old CPUs: https://github.com/v8/v8/blob/dc712da548c7fb433caed56af9a021d964952728/src/base/platform/time.cc#L672

Ruby uses QueryPerformanceCounter: https://github.com/ruby/ruby/blob/44cff500a0ad565952e84935bc98523c36a91b06/win32/win32.c#L4712

C# implements QueryPerformanceCounter on other platforms using CLOCK_MONOTONIC, indicating that they should be roughly equivalent: https://github.com/dotnet/runtime/blob/01b7e73cd378145264a7cb7a09365b41ed42b240/src/coreclr/pal/src/misc/time.cpp#L175

Swift originally used QueryPerformanceCounter, but switched to QueryUnbiasedInterruptTime() because they didn't want to count time the system spent asleep: swiftlang/swift-corelibs-libdispatch@766d647

------

Note that none of these languages use GetTickCount64(). Swift is an interesting counter point, and I noticed QueryUnbiasedInterruptTime() is available on Windows 8 while QueryInterruptTime() is new as of Windows 10. The "Unbiased" just refers to whether it advances during sleep.

I'm not actually sure whether time.monotonic() in Python counts time spent asleep, or whether that's desirable. Some kinds of timers using monotonic time should definitely freeze during sleep so they don't cause a flurry of activity on wake, but others definitely need to roughly track wall clock time, even during sleep.

Perhaps the long term answer would be to introduce separate "asleep" and "awake" monotonic clocks in Python, and possibly deprecate perf_counter() if it's redundant after this (as I think it's aliased to monotonic() on non-Windows platforms anyway).

@lunixbochs lunixbochs mannequin changed the title time.monotonic() should use QueryPerformanceCounter() on Windows time.monotonic() should use a different clock source on Windows Jun 7, 2021
@lunixbochs lunixbochs mannequin changed the title time.monotonic() should use QueryPerformanceCounter() on Windows time.monotonic() should use a different clock source on Windows Jun 7, 2021
@eryksun
Copy link
Contributor

eryksun commented Jun 9, 2021

You resolved bpo-41299 using QueryPerformanceCounter(), so we're already a step toward making it the default monotonic clock. Personally, I've only relied on QPC for short intervals, but, as you've highlighted above, other language runtimes use it for their monotonic clock. Since Vista, it's apparently more reliable in terms of calibration and ensuring that a processor TSC is only used if it's known to be invariant and constant.

That said, Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid solution. It uses the performance counter to interpolate a timestamp between interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if it's available via GetProcAddress().

QueryInterruptTimePrecise() is about 1.38 times the cost of QPC (on average across 100 million calls). Both functions are significantly more expensive than QueryInterruptTime() and GetTickCount64(), which simply return a value that's read from shared memory (i.e. the KUSER_SHARED_DATA structure).

QueryUnbiasedInterruptTime() is available on Windows 8 while
QueryInterruptTime() is new as of Windows 10. The "Unbiased"
just refers to whether it advances during sleep.

QueryInterruptTime() and QueryUnbiasedInterruptTime() don't provide high-resolution timestamps. They're updated by the system timer interrupt service routine, which defaults to 64 interrupts/second. The time increment depends on when the counter is read by the ISR, but it averages out to approximately the interrupt period (e.g. 15.625 ms).

I'm not actually sure whether time.monotonic() in Python counts
time spent asleep, or whether that's desirable.

POSIX doesn't specify whether CLOCK_MONOTONIC [1] should include the time that elapses while the system is in standby mode. In Linux, CLOCK_BOOTTIME includes this time, and CLOCK_MONOTONIC excludes it. Windows QueryUnbiasedInterruptTimePrecise excludes it.

Perhaps the long term answer would be to introduce separate
"asleep" and "awake" monotonic clocks in Python

Both may not be supportable on all platforms, but they're supported in Linux, Windows 10, and macOS. The latter has mach_continuous_time(), which includes the time in standby mode, and mach_absolute_time(), which excludes it.


[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/clock_gettime.html

@lunixbochs
Copy link
Mannequin Author

lunixbochs mannequin commented Jun 9, 2021

Great information, thanks!

Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid solution. It uses the performance counter to interpolate a timestamp between interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if it's available via GetProcAddress()

My personal vote is to use the currently most common clock source (QPC) for now for monotonic(), because it's the same across Windows versions and the most likely to produce portable monotonic timestamps between apps/languages on the same system. It's also the easiest patch, as there's already a code path for QPC.

(As someone building multi-app experiences around Python, I don't want to check the Windows version to see which time base Python is using. I'd feel better about switching to QITP() if/when Python drops Windows 8 support.)

A later extension of this idea (maybe behind a PEP) could be to survey the existing timers available on each platform and consider whether it's worth extending time to expose them all, and unify cross-platform the ones that are exposed (e.g. better formalize/document which clocks will advance while the machine is asleep on each platform).

@terryjreedy terryjreedy removed 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes labels Jun 12, 2021
@vstinner
Copy link
Member

Changing is clock is a tricky. There are many things to consider:

  • Is it really monotonic in all cases?
  • Does it have a better resolution than the previous clock?
  • Corner cases: does it include time spent in time.sleep() and while the system is suspended?
  • etc.

--

When I designed PEP-418 (in 2012), QueryPerformanceCounter() was not reliable:

"It has a much higher resolution, but has lower long term precision than GetTickCount() and timeGetTime() clocks. For example, it will drift compared to the low precision clocks."
https://www.python.org/dev/peps/pep-0418/#windows-queryperformancecounter

And there were a few bugs like: "The performance counter value may unexpectedly leap forward because of a hardware bug".

A Microsoft blog article explains that users wanting a steady clock with precision higher than GetTickCount() should interpolate GetTickCount() using QueryPerformanceCounter(). If I recall correctly, this is what Firefox did for instance.

Eryk: "That said, Windows 10 also provides QueryInterruptTimePrecise(), which is a hybrid solution. It uses the performance counter to interpolate a timestamp between interrupts. I'd prefer to use this for time.monotonic() instead of QPC, if it's available via GetProcAddress()."

Oh, good that they provided an implementation for that :-)

--

V8 uses QueryPerformanceCounter after checking for old CPUs: https://github.com/v8/v8/blob/dc712da548c7fb433caed56af9a021d964952728/src/base/platform/time.cc#L672

It uses CPUID to check for "non stoppable time stamp counter":
https://github.com/v8/v8/blob/master/src/base/cpu.cc

  // Check if CPU has non stoppable time stamp counter.
  const unsigned parameter_containing_non_stop_time_stamp_counter = 0x80000007;
  if (num_ext_ids >= parameter_containing_non_stop_time_stamp_counter) {
    __cpuid(cpu_info, parameter_containing_non_stop_time_stamp_counter);
    has_non_stop_time_stamp_counter_ = (cpu_info[3] & (1 << 8)) != 0;
  }

Maybe we use such check in Python: use GetTickCount() on old CPUs, or QueryPerformanceCounter() otherwise. MSVC provides the __cpuid() function:
https://docs.microsoft.com/en-us/cpp/intrinsics/cpuid-cpuidex?view=msvc-160

--

Swift originally used QueryPerformanceCounter, but switched to QueryUnbiasedInterruptTime() because they didn't want to count time the system spent asleep

Oh, I recall that it was a tricky question. The PEP-418 simply says:
"The behaviour of clocks after a system suspend is not defined in the documentation of new functions."

See "Include Sleep" and "Include Suspend" columns of my table:
https://www.python.org/dev/peps/pep-0418/#monotonic-clocks

@lunixbochs
Copy link
Mannequin Author

lunixbochs mannequin commented Jun 12, 2021

I think a lot of that is based on very outdated information. It's worth reading this article: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps

I will repeat Microsoft's current recommendation (from that article):

Windows has and will continue to invest in providing a reliable and efficient performance counter. When you need time stamps with a resolution of 1 microsecond or better and you don't need the time stamps to be synchronized to an external time reference, choose QueryPerformanceCounter, KeQueryPerformanceCounter, or KeQueryInterruptTimePrecise. When you need UTC-synchronized time stamps with a resolution of 1 microsecond or better, choose GetSystemTimePreciseAsFileTime or KeQuerySystemTimePrecise.

(Based on that, it may also be worth replacing time.time()'s GetSystemTimeAsFileTime with GetSystemTimePreciseAsFileTime in CPython, as GetSystemTimePreciseAsFileTime is available in Windows 8 and newer)

PEP-418:

It has a much higher resolution, but has lower long term precision than GetTickCount() and timeGetTime() clocks. For example, it will drift compared to the low precision clocks.

Microsoft on drift (from the article above):

To reduce the adverse effects of this frequency offset error, recent versions of Windows, particularly Windows 8, use multiple hardware timers to detect the frequency offset and compensate for it to the extent possible. This calibration process is performed when Windows is started.

Modern Windows also automatically detects and works around stoppable TSC, as well as several other issues:

Some processors can vary the frequency of the TSC clock or stop the advancement of the TSC register, which makes the TSC unsuitable for timing purposes on these processors. These processors are said to have non-invariant TSC registers. (Windows will automatically detect this, and select an alternative time source for QPC).

It seems like Microsoft considers QPC to be a significantly better time source now, than when PEP-418 was written.

Another related conversation is whether Python can just expose all of the Windows clocks directly (through clock_gettime enums?), as that gives anyone who really wants full control over their timestamps a good escape hatch.

@vstinner
Copy link
Member

To reduce the adverse effects of this frequency offset error, recent versions of Windows, particularly Windows 8, use multiple hardware timers to detect the frequency offset and compensate for it to the extent possible. This calibration process is performed when Windows is started.

Technically, it remains possible to install Python on Windows 7, see: bpo-32592.

@eryksun
Copy link
Contributor

eryksun commented Jun 14, 2021

On second thought, starting with Windows 8, WaitForSingleObject() and WaitForMultipleObjects() exclude time when the system is suspended. For consistency, an external deadline (e.g. for SIGINT support) should work the same way. The monotonic clock should thus be based on QueryUnbiasedInterruptTime(). We can conditionally use QueryUnbiasedInterruptTimePrecise() in Windows 10, which I presume includes most users of Python 3.9+ on Windows since Windows 8.1 only has a 3% share of desktop/laptop systems.

If we can agree on the above, then the change to use QueryPerformanceCounter() to resolve bpo-41299 should be reverted. The deadline should instead be computed with QueryUnbiasedInterruptTime(). It's limited to the resolution of the system interrupt time, but at least compared to GetTickCount64() it returns the real interrupt time instead of an idealized 64 ticks/second.

expose all of the Windows clocks directly (through clock_gettime enums?)

_Py_clock_gettime() and _Py_clock_getres() could be implemented in Python/pytime.c. For Windows we could implement the following clocks:

CLOCK_REALTIME            GetSystemTimePreciseAsFileTime
CLOCK_REALTIME_COARSE     GetSystemTimeAsFileTime
CLOCK_MONOTONIC_COARSE    QueryUnbiasedInterruptTime
CLOCK_PROCESS_CPUTIME_ID  GetProcessTimes
CLOCK_THREAD_CPUTIME_ID   GetThreadTimes
    CLOCK_PERF_COUNTER        QueryPerformanceCounter

Windows 10+
CLOCK_MONOTONIC           QueryUnbiasedInterruptTimePrecise
CLOCK_BOOTTIME            QueryInterruptTimePrecise
CLOCK_BOOTTIME_COARSE     QueryInterruptTime

it may also be worth replacing time.time()'s GetSystemTimeAsFileTime with
GetSystemTimePreciseAsFileTime

See bpo-19007, which is nearly 8 years old.

@vstinner
Copy link
Member

IMO we should check how web browsers manage time on the different platforms. They have high expectations from clocks: security, efficiency, reliability.

Examples:

time_win.cc contains two long comments:

// Windows Timer Primer
//
// A good article:  http://www.ddj.com/windows/184416651
// A good mozilla bug:  http://bugzilla.mozilla.org/show_bug.cgi?id=363258
//
// The default windows timer, GetSystemTimeAsFileTime is not very precise.
// It is only good to ~15.5ms.
//
// QueryPerformanceCounter is the logical choice for a high-precision timer.
// However, it is known to be buggy on some hardware.  Specifically, it can
// sometimes "jump".  On laptops, QPC can also be very expensive to call.
// It's 3-4x slower than timeGetTime() on desktops, but can be 10x slower
// on laptops.  A unittest exists which will show the relative cost of various
// timers on any system.
//
// The next logical choice is timeGetTime().  timeGetTime has a precision of
// 1ms, but only if you call APIs (timeBeginPeriod()) which affect all other
// applications on the system.  By default, precision is only 15.5ms.
// Unfortunately, we don't want to call timeBeginPeriod because we don't
// want to affect other applications.  Further, on mobile platforms, use of
// faster multimedia timers can hurt battery life.  See the intel
// article about this here:
// http://softwarecommunity.intel.com/articles/eng/1086.htm
//
// To work around all this, we're going to generally use timeGetTime().  We
// will only increase the system-wide timer if we're not running on battery
// power.

and:


// Discussion of tick counter options on Windows:
//
// (1) CPU cycle counter. (Retrieved via RDTSC)
// The CPU counter provides the highest resolution time stamp and is the least
// expensive to retrieve. However, on older CPUs, two issues can affect its
// reliability: First it is maintained per processor and not synchronized
// between processors. Also, the counters will change frequency due to thermal
// and power changes, and stop in some states.
//
// (2) QueryPerformanceCounter (QPC). The QPC counter provides a high-
// resolution (<1 microsecond) time stamp. On most hardware running today, it
// auto-detects and uses the constant-rate RDTSC counter to provide extremely
// efficient and reliable time stamps.
//
// On older CPUs where RDTSC is unreliable, it falls back to using more
// expensive (20X to 40X more costly) alternate clocks, such as HPET or the ACPI
// PM timer, and can involve system calls; and all this is up to the HAL (with
// some help from ACPI). According to
// http://blogs.msdn.com/oldnewthing/archive/2005/09/02/459952.aspx, in the
// worst case, it gets the counter from the rollover interrupt on the
// programmable interrupt timer. In best cases, the HAL may conclude that the
// RDTSC counter runs at a constant frequency, then it uses that instead. On
// multiprocessor machines, it will try to verify the values returned from
// RDTSC on each processor are consistent with each other, and apply a handful
// of workarounds for known buggy hardware. In other words, QPC is supposed to
// give consistent results on a multiprocessor computer, but for older CPUs it
// can be unreliable due bugs in BIOS or HAL.
//
// (3) System time. The system time provides a low-resolution (from ~1 to ~15.6
// milliseconds) time stamp but is comparatively less expensive to retrieve and
// more reliable. Time::EnableHighResolutionTimer() and
// Time::ActivateHighResolutionTimer() can be called to alter the resolution of
// this timer; and also other Windows applications can alter it, affecting this
// one.

@vstinner
Copy link
Member

My notes on Windows clocks: https://vstinner.readthedocs.io/windows.html#time

Copy link
Member

Beware of relying on old code comments (I'm assuming that since both links in that comment above are from ~20 years ago that the code is also roughly that old).

QPC in particular has improved/changed behaviour a few times since then.

Personally, I'd trust anything Eryk Sun has posted in the last 3 years over code that was tweaked to perfection based on Windows XP.

@vstinner
Copy link
Member

Maybe I would be ok to use QPC for time.monotonic() if we only make the change conditionnal: only on the most recent Windows version, Windows 11. It would avoid any bad surprises on very old Windows versions running on old hardware with less reliable CPU TSC clock source.

Copy link
Member

Our earliest supported version is Windows 10 now, so we don't have to worry about really old ones.

This page covers the history. Looks like things have been "fine" since Windows 8: https://learn.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps

vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
On Windows, time.monotonic() now uses QueryPerformanceCounter() clock
instead of GetTickCount64() to have a resolution better than 1 us
instead of having a resolution of 15.6 ms.
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
On Windows, time.monotonic() now uses the QueryPerformanceCounter()
clock to have a resolution better than 1 us, instead of the
gGetTickCount64() clock which has a resolution of 15.6 ms.
@vstinner
Copy link
Member

I created #116781 to use QueryPerformanceCounter() for time.monotonic().

@vstinner
Copy link
Member

See also issue gh-115637.

vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
On Windows, time.monotonic() now uses the QueryPerformanceCounter()
clock to have a resolution better than 1 us, instead of the
gGetTickCount64() clock which has a resolution of 15.6 ms.
vstinner added a commit that referenced this issue Mar 14, 2024
On Windows, time.monotonic() now uses the QueryPerformanceCounter()
clock to have a resolution better than 1 us, instead of the
gGetTickCount64() clock which has a resolution of 15.6 ms.
@vstinner
Copy link
Member

time.monotonic() now uses QueryPerformanceCounter() (commit) and has a resolution lower than 1 us. I measured 100 ns (10 MHz) on my Windows 11 VM, whereas before it was 15.6 ms (64 Hz): the updated clock resolution is 156,000x better :-)

Thanks everybody for looking into this issue and providing very useful technical details about these clocks.

Follow-up: see issue gh-63207 to use GetSystemTimePreciseAsFileTime() for time.time().

vstinner added a commit to vstinner/cpython that referenced this issue Mar 20, 2024
…ython#116781)

On Windows, time.monotonic() now uses the QueryPerformanceCounter()
clock to have a resolution better than 1 us, instead of the
gGetTickCount64() clock which has a resolution of 15.6 ms.
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
…ython#116781)

On Windows, time.monotonic() now uses the QueryPerformanceCounter()
clock to have a resolution better than 1 us, instead of the
gGetTickCount64() clock which has a resolution of 15.6 ms.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…ython#116781)

On Windows, time.monotonic() now uses the QueryPerformanceCounter()
clock to have a resolution better than 1 us, instead of the
gGetTickCount64() clock which has a resolution of 15.6 ms.
vstinner added a commit to vstinner/cpython that referenced this issue Apr 29, 2024
On Windows, time.monotonic() now calls QueryPerformanceCounter()
which has a resolution of 100 ns. Reduce CLOCK_RES from 50 or 100 ms
to 1 ms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes OS-windows performance Performance or resource usage stdlib Python modules in the Lib dir
Projects
Archived in project
Development

No branches or pull requests

5 participants