-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve accuracy of builtin sum() for float inputs #100425
Comments
Very nice! It's surprising how little code it takes to make this work. A few questions:
On performance, I'm not seeing the "zero cost" on macOS / Intel. Here are some timings on my machine (comparing commit af7613e on this branch - "cpython" below - with commit e0b4d96 on main - "cpython-compare" below); both clones built with
Here's another test case that involves a decent amount of cancellation - we're computing (the imaginary part of) the Gauss sum for the prime import fractions
import math
import sys
import timeit
# Test case: Gauss sum for the prime 609359; mathematically, result
# should be exactly sqrt(p).
p = 609359
terms = [math.sin(2*math.pi*r*r / p) for r in range(p)]
# Exact sum of the actual terms we're using, and a way to compute the
# ulps error.
true_sum = sum(map(fractions.Fraction, terms))
def ulps_error(x: float) -> float:
return (fractions.Fraction(x) - true_sum) / math.ulp(true_sum)
# Error incurred by sum.
error = ulps_error(sum(terms))
# Timing
number = 1000
time = timeit.timeit(stmt="sum(terms)", globals=globals(), number=number)
print(sys.version)
print(f"error: {ulps_error(sum(terms))} ulps")
print(f"time: {1000.0 * time/number:.3f} ms per run") Results on my machine show around a 20% slowdown:
On pairwise summation: agreed that this (i.e., compensated summation) seems superior for our use-case, though it might be interesting to code up pairwise summation just to see what the performance tradeoffs look like. It doesn't need much in the way of auxiliary memory: at most a few hundred bytes for maintaining a stack of partial values of size O(log_2(length)). E.g., in Python: def _ruler_seq():
"""Sequence 0, 1, 0, 2, 0, 1, 0, 3, ... (A007814)"""
i = 0
while True:
yield (~(i + 1) & i).bit_length()
i += 1
def sum_pairwise(iterable, /, start=0.0):
"""Sum of an iterable of floats, via pairwise summation."""
partials = [start]
for x, pop_count in zip(iterable, _ruler_seq()):
for _ in range(pop_count):
x += partials.pop()
partials.append(x)
return sum(reversed(partials)) (The |
What are your thoughts on this? I was thinking that it has to be an implementation detail because it is so easily destroyed by double rounding environments or faulty compilers that optimize away the compensation. That said, I would hope that other implementations would do this or something similar. It doesn't take much effort.
Fortunately, we have a precedent here. When numpy switched to pairwise summation, the world was net better off and seemed happy to fix a few broken tests in exchange for better accuracy. Another precedent occurred when the floating point repr changed to display the shortest decimal representation from the equivalence class. Again, the world was net happier for the problems that it no longer had and was happy to fix a few broken tests. FWIW the PR also causes "unbreakage". Code that is normalizing a list (i.e. converting values to percentages) will work better without the user having to make changes:
Thanks for running a timing on Intel silicon. That is about what I had expected and it seems like a reasonable price. The Apple M1 timings were super fast but perplexing. The sum of 10k random values took only 25.2 usec with the PR and 28.2 usec without the PR. Presumably, we working right at the limit of optimizations and parallelization where any change can knock it in and out of local minimums.
I didn't but will look at it. I first tried straight Kahan summationand then the Neumaier variant. When the latter proved to be both faster (because it broke up the data dependency chain) and more accurate, I was pleasantly surprised and took the win and stopped exploring. Also, I looked as the generated assembly and there isn't much fat to be cut. The
I can mention the |
Thanks for the responses and the code updates for infinity and overflow. The code LGTM, and I really like the feature.
Agreed; this seems manageable. The only remaining question for me is whether the extra accuracy is worth the performance hit. It would be useful to get some numbers on Linux/x64 and Windows/x64 (and maybe ARM, too). I'm +1 on making this trade-off. (I'm personally not too bothered by Python's |
SGTM; I'd expect that PyPy at least would want to match the new behaviour, but that's up to them. I'd also want to leave the door open for implementations to use some other flavour of compensated summation if that works better for them, for whatever reason. |
As a data point, it looks as though Java's |
If I'm not mistaken, you're not using I mean here: Line 2575 in 4da0b8f
|
@mdickinson Okay. Marked as an implementation detail. We and other implementations are free to switch methodologies at any time. @pochmann I'm not sure that it matters, but for consistency, I added |
FWIW, testing on Windows on an Ivy Bridge i7-3630QM 4c/8t laptop downclocked to 2.3 GHz, I get the following results with MSC v.1916 x64 (the Python under the Testing the first set of @mdickinson 's $ PCbuild/amd64/python.exe -c "import sys; print(sys.version)"
3.12.0a3+ (heads/main-dirty:1ecfd1ebf1, Dec 23 2022, 16:48:43) [MSC v.1916 64 bit (AMD64)]
$ PCbuild/amd64/python.exe -m timeit -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range(10_0 00)]' 'sum(d)'
5000 loops, best of 5: 53.6 usec per loop
$ PCbuild/amd64/python.exe -m timeit -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range(10_0 00)]' 'sum(d)'
5000 loops, best of 5: 53.1 usec per loop
$ PCbuild/amd64/python.exe -m timeit -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range(10_0 00)]' 'sum(d)'
5000 loops, best of 5: 53.4 usec per loop
$ test/PCbuild/amd64/python.exe -c "import sys; print(sys.version)"
3.12.0a3+ (heads/compensated_sum-dirty:3cf9536ad7, Dec 23 2022, 16:51:47) [MSC v.1916 64 bit (AMD64)]
$ test/PCbuild/amd64/python.exe -m timeit -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range (10_000)]' 'sum(d)'
5000 loops, best of 5: 71.6 usec per loop
$ test/PCbuild/amd64/python.exe -m timeit -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range (10_000)]' 'sum(d)'
5000 loops, best of 5: 71 usec per loop
$ test/PCbuild/amd64/python.exe -m timeit -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range (10_000)]' 'sum(d)'
5000 loops, best of 5: 69 usec per loop And testing @mdickinson 's $ PCbuild/amd64/python.exe compensated.py
3.12.0a3+ (heads/main-dirty:1ecfd1ebf1, Dec 23 2022, 16:48:43) [MSC v.1916 64 bit (AMD64)]
error: 64.80813372135162 ulps
time: 3.568 ms per run
$ test/PCbuild/amd64/python.exe compensated.py
3.12.0a3+ (heads/compensated_sum-dirty:3cf9536ad7, Dec 23 2022, 16:51:47) [MSC v.1916 64 bit (AMD64)]
error: -0.19186627864837646 ulps
time: 4.804 ms per run For what it's worth, on Python 3.11.0 and the latest NumPy 1.24.0 and dependencies (via Conda-Forge), $ python -m timeit -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range (10_000)]' 'sum(d)'
5000 loops, best of 5: 52.8 usec per loop
$ python -m timeit -s 'from random import expovariate as r' -s 'import numpy as np' -s 'd=np.array([r(1.0) for i in range (10_000)])' 'np.sum(d)'
20000 loops, best of 5: 19.8 usec per loop
$ python -m timeit -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range (1_000_000)]' 'sum(d)'
50 loops, best of 5: 5.71 msec per loop
$ python -m timeit -s 'from random import expovariate as r' -s 'import numpy as np' -s 'd=np.array([r(1.0) for i in range (1_000_000)])' 'np.sum(d)'
200 loops, best of 5: 1.16 msec per loop In general, as a member of the scientific Python community (but merely one single datapoint), I would likewise typically be using NumPy arrays (or data structures build from them, e.g. DataFrames, xarrays, tensors, etc) for at least the great majority of cases that would likely matter for performance when summing many elements. But I certainly can't speak for everyone in the general case, and it seems worth considering more datapoints (both of benchmark results, and use cases) before introducing a potential ≈20-35% regression to the performance of a key builtin on a fundamental datatype. |
Mark had asked for more performance numbers. On the microbenchmark, I also see about a 30% slowdown on Linux with an AMD Zen 2 chip, with
(I don't have an opinion on the tradeoff. The scientific computing I do at work is mostly not on CPUs. It looks like Raymond already considered the partial pairwise thing that numpy does) |
FWIW I think the trade-off here is fine, people who really want sum performance on a huge quantity of floats should be using numpy. |
* Correct CVE-2020-10735 documentation (python#100306) * pythongh-94912: Added marker for non-standard coroutine function detection (python#99247) This introduces a new decorator `@inspect.markcoroutinefunction`, which, applied to a sync function, makes it appear async to `inspect.iscoroutinefunction()`. * Docs: Don't upload CI artifacts (python#100330) * pythongh-89727: Fix os.walk RecursionError on deep trees (python#99803) Use a stack to implement os.walk iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees. * pythongh-69929: re docs: Add more specific definition of \w (python#92015) Co-authored-by: Jelle Zijlstra <[email protected]> * pythongh-89051: Add ssl.OP_LEGACY_SERVER_CONNECT (python#93927) Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Christian Heimes <[email protected]> Co-authored-by: Hugo van Kemenade <[email protected]> Fixes python#89051 * pythongh-88211: Change lower-case and upper-case to match recommendations in imaplib docs (python#99625) * pythongh-100348: Fix ref cycle in `asyncio._SelectorSocketTransport` with `_read_ready_cb` (python#100349) * pythongh-99925: Fix inconsistency in `json.dumps()` error messages (pythonGH-99926) * Clarify that every thread has its own default context in contextvars (python#99246) * pythongh-99576: Fix cookiejar file that was not truncated for some classes (pythonGH-99616) Co-authored-by: Łukasz Langa <[email protected]> * pythongh-100188: Reduce misses in BINARY_SUBSCR_(LIST/TUPLE)_INT (python#100189) Don't specialize if the index is negative. * pythongh-99991: improve docs on str.encode and bytes.decode (python#100198) Co-authored-by: C.A.M. Gerlach <[email protected]> * pythongh-91081: Add note on WeakKeyDictionary behavior when deleting a replaced entry (python#91499) Co-authored-by: Pieter Eendebak <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]> * pythongh-85267: Improvements to inspect.signature __text_signature__ handling (python#98796) This makes a couple related changes to inspect.signature's behaviour when parsing a signature from `__text_signature__`. First, `inspect.signature` is documented as only raising ValueError or TypeError. However, in some cases, we could raise RuntimeError. This PR changes that, thereby fixing python#83685. (Note that the new ValueErrors in RewriteSymbolics are caught and then reraised with a message) Second, `inspect.signature` could randomly drop parameters that it didn't understand (corresponding to `return None` in the `p` function). This is the core issue in python#85267. I think this is very surprising behaviour and it seems better to fail outright. Third, adding this new failure broke a couple tests. To fix them (and to e.g. allow `inspect.signature(select.epoll.register)` as in python#85267), I add constant folding of a couple binary operations to RewriteSymbolics. (There's some discussion of making signature expression evaluation arbitrary powerful in python#68155. I think that's out of scope. The additional constant folding here is pretty straightforward, useful, and not much of a slippery slope) Fourth, while python#85267 is incorrect about the cause of the issue, it turns out if you had consecutive newlines in __text_signature__, you'd get `tokenize.TokenError`. Finally, the `if name is invalid:` code path was dead, since `parse_name` never returned `invalid`. * pythonGH-100363: Speed up `asyncio.get_running_loop` (python#100364) * pythonGH-100133: fix `asyncio` subprocess losing `stderr` and `stdout` output (python#100154) * pythongh-100374: Fixed a bug in socket.getfqdn() (pythongh-100375) * pythongh-100129: Add tests for pickling all builtin types and functions (pythonGH-100142) * Remove unused variable from `dis._find_imports` (python#100396) * pythongh-78878: Fix crash when creating an instance of `_ctypes.CField` (python#14837) * pythonGH-69564: Clarify use of octal format of mode argument in help(os.chmod) (python#20621) Co-authored-by: Kumar Aditya <[email protected]> * pythonGH-99554: Pack location tables more effectively (pythonGH-99556) * Correct typo in typing.py (python#100423) In the docstring of `ParamSpec`, the name of `P = ParamSpec('P')` was mistakenly written as `'T'`. * pythongh-99761: Add `_PyLong_IsPositiveSingleDigit` function to check for single digit integers (python#100064) * pythonGH-99770: Make the correct call specialization fail kind show up in the stats (pythonGH-99771) * pythongh-78997: fix bad rebase of moved test file (python#100424) * pythongh-100344: Add C implementation for `asyncio.current_task` (python#100345) Co-authored-by: pranavtbhat * pythonGH-99554: Trim trailing whitespace (pythonGH-100435) Automerge-Triggered-By: GH:brandtbucher * pythongh-85432: Harmonise parameter names between C and pure-Python implementations of `datetime.time.strftime`, `datetime.datetime.fromtimestamp` (python#99993) * pythongh-57762: fix misleading tkinter.Tk docstring (python#98837) Mentioned as a desired change by terryjreedy on the corresponding issue, since Tk is not a subclass of Toplevel. * pythongh-48496: Added example and link to faq for UnboundLocalError in reference (python#93068) * Fix typo in 3.12 What's New (python#100449) * pythongh-76963: PEP3118 itemsize of an empty ctypes array should not be 0 (pythonGH-5576) The itemsize returned in a memoryview of a ctypes array is now computed from the item type, instead of dividing the total size by the length and assuming that the length is not zero. * pythonGH-100459: fix copy-paste errors in specialization stats (pythonGH-100460) * pythongh-99110: Initialize `frame->previous` in init_frame to fix segmentation fault when accessing `frame.f_back` (python#100182) * pythongh-98712: Clarify "readonly bytes-like object" semantics in C arg-parsing docs (python#98710) * pythongh-92216: improve performance of `hasattr` for type objects (pythonGH-99979) * pythongh-100288: Specialise LOAD_ATTR_METHOD for managed dictionaries (pythonGH-100289) * Revert "pythongh-100288: Specialise LOAD_ATTR_METHOD for managed dictionaries (pythonGH-100289)" (python#100468) This reverts commit c3c7848. * pythongh-94155: Reduce hash collisions for code objects (python#100183) * Uses a better hashing algorithm to get better dispersion and remove commutativity. * Incorporates `co_firstlineno`, `Py_SIZE(co)`, and bytecode instructions. * This is now the entire set of criteria used in `code_richcompare`, except for `_PyCode_ConstantKey` (which would incorporate the types of `co_consts` rather than just their values). * pythongh-83076: 3.8x speed improvement in (Async)Mock instantiation (python#100252) * pythongh-99482: remove `jython` compatibility parts from stdlib and tests (python#99484) * bpo-40447: accept all path-like objects in compileall.compile_file (python#19883) Signed-off-by: Filipe Laíns <[email protected]> Signed-off-by: Filipe Laíns <[email protected]> Co-authored-by: Irit Katriel <[email protected]> Co-authored-by: Shantanu <[email protected]> * pythonGH-100425: Improve accuracy of builtin sum() for float inputs (pythonGH-100426) * pythongh-68320, pythongh-88302 - Allow for private `pathlib.Path` subclassing (pythonGH-31691) Users may wish to define subclasses of `pathlib.Path` to add or modify existing methods. Before this change, attempting to instantiate a subclass raised an exception like: AttributeError: type object 'PPath' has no attribute '_flavour' Previously the `_flavour` attribute was assigned as follows: PurePath._flavour = xxx not set!! xxx PurePosixPath._flavour = _PosixFlavour() PureWindowsPath._flavour = _WindowsFlavour() This change replaces it with a `_pathmod` attribute, set as follows: PurePath._pathmod = os.path PurePosixPath._pathmod = posixpath PureWindowsPath._pathmod = ntpath Functionality from `_PosixFlavour` and `_WindowsFlavour` is moved into `PurePath` as underscored-prefixed classmethods. Flavours are removed. Co-authored-by: Alex Waygood <[email protected]> Co-authored-by: Brett Cannon <[email protected]> Co-authored-by: Adam Turner <[email protected]> Co-authored-by: Eryk Sun <[email protected]> * pythongh-99947: Ensure unreported errors are chained for SystemError during import (pythonGH-99946) * Add "strict" to dotproduct(). Add docstring. Factor-out common code. (pythonGH-100480) * pythongh-94808: improve test coverage of number formatting (python#99472) * pythongh-100454: Start running SSL tests with OpenSSL 3.1.0-beta1 (python#100456) * pythongh-100268: Add is_integer method to int (python#100439) This improves the lives of type annotation users of `float` - which type checkers implicitly treat as `int|float` because that is what most code actually wants. Before this change a `.is_integer()` method could not be assumed to exist on things annotated as `: float` due to the method not existing on both types. * pythongh-77771: Add enterabs example in sched (python#92716) Co-authored-by: Shantanu <[email protected]> * pythonGH-91166: Implement zero copy writes for `SelectorSocketTransport` in asyncio (python#31871) Co-authored-by: Guido van Rossum <[email protected]> * pythonGH-91166: Implement zero copy writes for `SelectorSocketTransport` in asyncio (python#31871) Co-authored-by: Guido van Rossum <[email protected]> * Misc Itertools recipe tweaks (pythonGH-100493) * pythongh-100357: Convert several functions in `bltinsmodule` to AC (python#100358) * Remove wrong comment about `repr` in `test_unicode` (python#100495) * pythongh-99908: Tutorial: Modernize the 'data-record class' example (python#100499) Co-authored-by: Alex Waygood <[email protected]> * pythongh-100474: Fix handling of dirs named index.html in http.server (pythonGH-100475) If you had a directory called index.html or index.htm within a directory, it would cause http.server to return a 404 Not Found error instead of the directory listing. This came about due to not checking that the index was a regular file. I have also added a test case for this situation. Automerge-Triggered-By: GH:merwok * pythongh-100287: Fix unittest.mock.seal with AsyncMock (python#100496) * pythongh-99535: Add test for inheritance of annotations and update documentation (python#99990) * pythongh-100428: Make float documentation more accurate (python#100437) Previously, the grammar did not accept `float("10")`. Also implement mdickinson's suggestion of removing the indirection. * [Minor PR] Quotes in documentation changed into code blocks (python#99536) Minor formatting fix in documentation Co-authored-by: Shantanu <[email protected]> * pythongh-100472: Fix docs claim that compileall parameters could be bytes (python#100473) * pythongh-100519: simplification to `eff_request_host` in cookiejar.py (python#99588) `IPV4_RE` includes a `.`, and the `.find(".") == -1` included here is already testing to make sure there's no dot, so this part of the expression is tautological. Instead use more modern `in` syntax to make it clear what the check is doing here. The simplified implementation more clearly matches the wording in RFC 2965. Co-authored-by: hauntsaninja <[email protected]> * pythongh-99308: Clarify re docs for byte pattern group names (python#99311) * pythongh-92446: Improve argparse choices docs; revert bad change to lzma docs (python#94627) Based on the definition of the collections.abc classes, it is more accurate to use "sequence" instead of "container" when describing argparse choices. A previous attempt at fixing this in python#92450 was mistaken; this PR reverts that change. Co-authored-by: Shantanu <[email protected]> * Fix name of removed `inspect.Signature.from_builtin` method in 3.11.0a2 changelog (python#100525) * pythongh-100520: Fix `rst` markup in `configparser` docstrings (python#100524) * pythongh-99509: Add `__class_getitem__` to `multiprocessing.queues.Queue` (python#99511) * pythongh-94603: micro optimize list.pop (pythongh-94604) * Remove `NoneType` redefinition from `clinic.py` (python#100551) * pythongh-100553: Improve accuracy of sqlite3.Row iter test (python#100555) * pythonGH-98831: Modernize a ton of simpler instructions (python#100545) * load_const and load_fast aren't families for now * Don't decref unmoved names * Modernize GET_ANEXT * Modernize GET_AWAITABLE * Modernize ASYNC_GEN_WRAP * Modernize YIELD_VALUE * Modernize POP_EXCEPT (in more than one way) * Modernize PREP_RERAISE_STAR * Modernize LOAD_ASSERTION_ERROR * Modernize LOAD_BUILD_CLASS * Modernize STORE_NAME * Modernize LOAD_NAME * Modernize LOAD_CLASSDEREF * Modernize LOAD_DEREF * Modernize STORE_DEREF * Modernize COPY_FREE_VARS (mark it as done) * Modernize LIST_TO_TUPLE * Modernize LIST_EXTEND * Modernize SET_UPDATE * Modernize SETUP_ANNOTATIONS * Modernize DICT_UPDATE * Modernize DICT_MERGE * Modernize MAP_ADD * Modernize IS_OP * Modernize CONTAINS_OP * Modernize CHECK_EXC_MATCH * Modernize IMPORT_NAME * Modernize IMPORT_STAR * Modernize IMPORT_FROM * Modernize JUMP_FORWARD (mark it as done) * Modernize JUMP_BACKWARD (mark it as done) Signed-off-by: Filipe Laíns <[email protected]> Signed-off-by: Filipe Laíns <[email protected]> Co-authored-by: Jeremy Paige <[email protected]> Co-authored-by: Carlton Gibson <[email protected]> Co-authored-by: Hugo van Kemenade <[email protected]> Co-authored-by: Jon Burdo <[email protected]> Co-authored-by: Stanley <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]> Co-authored-by: Thomas Grainger <[email protected]> Co-authored-by: Brad Wolfe <[email protected]> Co-authored-by: Richard Kojedzinszky <[email protected]> Co-authored-by: František Nesveda <[email protected]> Co-authored-by: Pablo Galindo Salgado <[email protected]> Co-authored-by: Nikita Sobolev <[email protected]> Co-authored-by: Łukasz Langa <[email protected]> Co-authored-by: Dennis Sweeney <[email protected]> Co-authored-by: Bisola Olasehinde <[email protected]> Co-authored-by: C.A.M. Gerlach <[email protected]> Co-authored-by: Pieter Eendebak <[email protected]> Co-authored-by: Shantanu <[email protected]> Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Dominic Socular <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Hai Shi <[email protected]> Co-authored-by: amaajemyfren <[email protected]> Co-authored-by: Brandt Bucher <[email protected]> Co-authored-by: david-why <[email protected]> Co-authored-by: Pieter Eendebak <[email protected]> Co-authored-by: penguin_wwy <[email protected]> Co-authored-by: Eli Schwartz <[email protected]> Co-authored-by: Itamar Ostricher <[email protected]> Co-authored-by: Alex Waygood <[email protected]> Co-authored-by: Eric Wieser <[email protected]> Co-authored-by: Irit Katriel <[email protected]> Co-authored-by: Bill Fisher <[email protected]> Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Ken Jin <[email protected]> Co-authored-by: Carl Meyer <[email protected]> Co-authored-by: Filipe Laíns <[email protected]> Co-authored-by: Raymond Hettinger <[email protected]> Co-authored-by: Barney Gale <[email protected]> Co-authored-by: Brett Cannon <[email protected]> Co-authored-by: Adam Turner <[email protected]> Co-authored-by: Eryk Sun <[email protected]> Co-authored-by: Sebastian Berg <[email protected]> Co-authored-by: Illia Volochii <[email protected]> Co-authored-by: JosephSBoyle <[email protected]> Co-authored-by: James Frost <[email protected]> Co-authored-by: MonadChains <[email protected]> Co-authored-by: Bart Broere <[email protected]> Co-authored-by: Glyph <[email protected]> Co-authored-by: hauntsaninja <[email protected]> Co-authored-by: Ilya Kulakov <[email protected]> Co-authored-by: Guy Yagev <[email protected]> Co-authored-by: Jakub Kuczys <[email protected]>
Just for reference, results on the same with $ PCbuild/amd64/python.exe -c "import sys; print(sys.version)"
3.12.0a3+ (heads/main-dirty:1ecfd1ebf1, Dec 23 2022, 16:48:43) [MSC v.1916 64 bit (AMD64)]
$ PCbuild/amd64/python.exe -m timeit -s 'from math import fsum' -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range (10_000)]' 'fsum(d)'
500 loops, best of 5: 559 usec per loop
$ PCbuild/amd64/python.exe -m timeit -s 'from math import fsum' -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range (10_000)]' 'fsum(d)'
500 loops, best of 5: 470 usec per loop
$ PCbuild/amd64/python.exe -m timeit -s 'from math import fsum' -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range (10_000)]' 'fsum(d)'
500 loops, best of 5: 464 usec per loop |
* main: (60 commits) pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078) pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012) pythongh-101566: Sync with zipp 3.14. (pythonGH-102018) pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589) pythongh-88233: zipfile: handle extras after a zip64 extra (pythonGH-96161) pythongh-101981: Apply HOMEBREW related environment variables (pythongh-102074) pythongh-101907: Stop using `_Py_OPCODE` and `_Py_OPARG` macros (pythonGH-101912) pythongh-101819: Adapt _io types to heap types, batch 1 (pythonGH-101949) pythongh-101981: Build macOS as recommended by the devguide (pythonGH-102070) pythongh-97786: Fix compiler warnings in pytime.c (python#101826) pythongh-101578: Amend PyErr_{Set,Get}RaisedException docs (python#101962) Misc improvements to the float tutorial (pythonGH-102052) pythongh-85417: Clarify behaviour on branch cuts in cmath module (python#102046) pythongh-100425: Update tutorial docs related to sum() accuracy (FH-101854) Add missing 'is' to `cmath.log()` docstring (python#102049) pythongh-100210: Correct the comment link for unescaping HTML (python#100212) pythongh-97930: Also include subdirectory in makefile. (python#102030) pythongh-99735: Use required=True in argparse subparsers example (python#100927) Fix incorrectly documented attribute in csv docs (python#101250) pythonGH-84783: Make the slice object hashable (pythonGH-101264) ...
* main: (225 commits) pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078) pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012) pythongh-101566: Sync with zipp 3.14. (pythonGH-102018) pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589) pythongh-88233: zipfile: handle extras after a zip64 extra (pythonGH-96161) pythongh-101981: Apply HOMEBREW related environment variables (pythongh-102074) pythongh-101907: Stop using `_Py_OPCODE` and `_Py_OPARG` macros (pythonGH-101912) pythongh-101819: Adapt _io types to heap types, batch 1 (pythonGH-101949) pythongh-101981: Build macOS as recommended by the devguide (pythonGH-102070) pythongh-97786: Fix compiler warnings in pytime.c (python#101826) pythongh-101578: Amend PyErr_{Set,Get}RaisedException docs (python#101962) Misc improvements to the float tutorial (pythonGH-102052) pythongh-85417: Clarify behaviour on branch cuts in cmath module (python#102046) pythongh-100425: Update tutorial docs related to sum() accuracy (FH-101854) Add missing 'is' to `cmath.log()` docstring (python#102049) pythongh-100210: Correct the comment link for unescaping HTML (python#100212) pythongh-97930: Also include subdirectory in makefile. (python#102030) pythongh-99735: Use required=True in argparse subparsers example (python#100927) Fix incorrectly documented attribute in csv docs (python#101250) pythonGH-84783: Make the slice object hashable (pythonGH-101264) ...
(cherry picked from commit aab6f71) Co-authored-by: Raymond Hettinger <[email protected]>
There is no negative exponent here. It's >>> float(10**16 + 2).hex() # single rounding
'0x1.1c37937e08001p+53'
>>> float(10**16 + 4).hex() # double rounding
'0x1.1c37937e08002p+53' |
This didn't need to go into the implementation of a built in function, when applied to a basic type. To me this is giving people what they want (not be surprised that Now |
Some consequences of the implementation class R(float): pass
sum([0.0, 0.0, float(2**53), 1.0, 1.0, 0.0, 0.0]) # -> 9007199254740994.0 All the sums used the correction
sum([R(0.0), 0.0, float(2**53), 1.0, 1.0, 0.0, 0.0]) # -> 9007199254740994.0 Essentially all the sums used the correction
sum([0.0, R(0.0), float(2**53), 1.0, 1.0, 0.0, 0.0]) # -> 9007199254740992.0 Ordinary sum of float for pretty much the entire sum
sum([0.0, 0.0, float(2**53), 1.0, 1.0, 0.0, R(0.0)]) # -> 9007199254740994.0 Nearly all sums used the correction Could the built in function The "however, that function isn't well known" regarding This, as opposed to having
That hides a lot of detail of what it really does, and to make the description accurate it would add quite a bit of complexity to the explanation. "Adds floats from left to right, using a compensated sum if from the beginning (except possibly the first) are float. If there is a non-float (beyond the first item) the sum from that point on is that of the items. If the sum of the first and Also, from a didactic point of view, letting people face the peculiarities of adding |
@franklinvp, I think this issue has enough arguments for this feature of the sum(). See also #111933. If you really want to revive discussion - probably you should rather start a discourse thread (on https://discuss.python.org/). BTW, it's not just "adding from left to right with sum the items define". Here is the docstring:
Sphinx docs:
Neither specify how to compute the sum of numbers, i.e. this shouldn't be just an equivalent of |
Associative isn't enough, think for example of string addition, which is associative but a = c = 1e308
b = -a
print((a + b) + c)
print(a + (b + c))
print((a + c) + b) Output (Attempt This Online!):
|
@pochmann3 Be careful
that is not always satisfied. (float(2**53) + float(1)) + float(1) == float(2**53)+ (float(1) + float(1)) # False @skirpichev #111933 is an example of what the new The documentation of And the explanation that it has for It is true that the documentation does not specify which sum is done from left to right. It is bizarre that for class R(float): pass
# After `R(0.0)` is the sum of `float`, before it it is a compensated sum.
sum([float(2**53), float(1.0), float(1.0), R(0.0), float(2**53), float(1.0), float(1.0)]) |
Well, I'm assuming numeric values ("This function is intended specifically for use with numeric values and may reject non-numeric types."), i.e. commutative addition. Strings are banned by sum(), but there is a way: >>> sum([[1], [2]], start=[3])
[3, 1, 2]
>>> sum([[2], [1]], start=[3])
[3, 2, 1]
That's just another example of missing associativity:
Lets take this example. sum() now returns
Looking on the implementation, I don't see it's violated somehow. Code does summation "from left to right", it's just not simple
Strictly speaking, R(0.0) is not a float, but an instance of it's subclass. Specialized code uses |
The items after it are float and are added as with the sum of float. The items before it are also float, but are added with a different sum. The reason for not implementing
Putting this compensated sum in some other function nobody looses. |
In short, R(0.0) appearance cancel running of float-specific algorithms. This happened before too, that can be demonstrated by benchmark:
But now also results differ:
As I said above, perhaps the specialization could be extended to subclasses of builtin types. But it's another issue. I don't see how current behaviour violates the docs. It's your interpretation, that
I don't see why someone will want fast, but inaccurate sum.
I think we are repeating already given arguments. In the referenced above issue #111933 several core devs rejected idea of restoring pre-3.12 behaviour. If you wish to change that - try to open a discourse thread. BTW, now this will be a backward compatibility break, sorry. |
Timing does not prove anything regarding which operation is used to add an
Known. It is you who is repeating things.
That makes the cons that I said above worse. I hope they really don't do that.
They clearly did not care about that, when they introduced this change. They just got overenthusiastic about how easy it was to add this compensated sum to the existing implementation. Anyway, I don't wish for anything. People do whatever they want. They can point out a thousand reasons why this is a good change, and I agree with probably all of them. I am only pointing out that this change affects some uses of Python, when it is possible to add this algorithm without affecting anyone. |
In this case difference is too significant to be ignored. And of course you can verify this conclusion looking to the sources.
I'm afraid, that it's not:
I don't think so. In the first comment by Mark - raised question on breaking backward compatibility. Fast alternative for uncompensated sum also was discussed.
Then what you are trying to do here? |
From 3.11 if (PyFloat_CheckExact(item)) {
f_result += PyFloat_AS_DOUBLE(item);
_Py_DECREF_SPECIALIZED(item, _PyFloat_ExactDealloc);
continue; |
A long closed and implemented issue (this has already shipped) is not the place to be having a discussion. I'm going to lock the conversation here. If you want to discuss a feature to be implemented or believe there is a bug, bring it up on discuss.python.org or file a new issue. |
Currently
sum()
makes no efforts to improve accuracy over a simple running total. We do havemath.fsum()
that makes extreme efforts to be almost perfect; however, that function isn't well known, it runs 10x slower than regularsum()
, and it always coerces to a float.I suggest switching the builtin
sum()
handling of float inputs to Arnold Neumaier's improved variation of compensated summation. Per his paper, this algorithm has excellent error bounds (though not as perfect asfsum()
:The compensation tracking runs in parallel to the main accumulation. And except for pathological cases, the branch is predictable, making the test essentially free. Thanks to the magic of instruction level parallelism and branch prediction, this improvement has zero cost on my Apple M1 build. Timed with:
./python.exe -m timeit -s 'from random import expovariate as r' -s 'd=[r(1.0) for i in range(10_000)]' 'sum(d)'
N.B. Numpy switched from a simple running total to partial pairwise summation. That isn't as accurate as what is being proposed here, but it made more sense for them because the extra work of Neumaier summation isn't masked by the overhead of fetching values from an iterator as we do here. Also with an iterator, we can't do pairwise summation without using auxiliary memory.
Linked PRs
The text was updated successfully, but these errors were encountered: