-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leverage the new PEP 574 for no-copy pickling of contiguous arrays #11161
Comments
Also, my naive reducer only works if the array is C-contiguous. Adding support for F-contiguous arrays should be easy by taking a transposed view. I am not sure about pickling non-contiguous arrays/views. |
Your example is not zero-copy as it doesn't pass a |
Well it does stream the content into the file object without making any spurious copy. This is checked with the linked gist script. There is a sample output of the script in the gist comments. It's in-band zero copy :) The |
To clarify: an example (trivial use) of zero-copy is as follows:
In a real use case, the |
hi @ogrisel @pitrou
Each one of them for different shapes. Here is what we get: There seems to be also a big memory increase between pickle protocol 4 and pickle protocol 5 with no buffer callback. I cannot explain it for now, will investigate. What do you guys think? |
Thank you very much @pierreglaser. Did you measure CPU times as well? I cannot explain why protocol 5 without callbacks would exhibit larger memory consumption. It may be a problem in how you implemented |
RSS with psutil I guess. |
@pierreglaser did you try to subtract the baseline memory cost? If so, how did you do it? Edit: added pierreglaser's handle instead of ogrisel. :/ man I'm 0/1. |
@pierreglaser did the work. He uses airspeed velocity to do the measurements. I am not sure if the baseline is substracted or not. Probably not. |
@pierreglaser I suppose this is your branch: master...pierreglaser:implement-reduce-ex There is a problem: you use Python 3.8 -specific C-API, notably the So instead of using |
Also maybe feel free to open a |
One could also use a preprocessor conditional block, something like: #if PY_VERSION_HEX < 0x03080000
/* code for Python 3.7 and older */
#else
/* code for Python 3.8+ */
#endif Things get a bit more complicated if you want to take into account the |
@hmaarrfk the first numpy array has a size of 80KB which is residual compared to the memory usage (20MB). So my guess would be 20MB as a rough estimate. |
By the way, both @ogrisel and I got better results with protocol 5 over protocol 4, even without a buffer callback:
@pierreglaser Out of curiosity, can you post the code for your benchmark? |
The solution with the preprocessor conditional block is simple but indeed it does not make it possible to benefit from the |
@pierreglaser what you call stream in your benchmark is not a stream, it's an in-memory bytes string. It would be more interesting to pickle to an out-of-RAM file object (e.g. an temporary file on the filesystem): use This should make it easier to identify spurious copies. Also please use a payload that is significantly larger than the default size of the python process to make it easier to count the number of spurious copies. For instance a 200MB numpy array. |
Or even a 1GB array. |
Also I would rename |
What are the units in your graphs, @pierreglaser? Am particularly looking at runtime. Has it been normalized somehow? |
Hi All,
Here is the updated graph (with correct units @jakirkham ). The peak memory increase for pickle5 with no out of band buffer was a false alert. (Note: time axis is in log scale) |
Thanks for the update :-) |
Closing: #12011 was merged in numpy master. |
PEP 574 (scheduled for Python 3.8) introduces pickle protocol 5 with support for no-copy pickling of large mutable buffers.
I made a small proof-of-concept benchmark script using @pitrou's pickle5 backport of his draft implementation of PEP 547.
See: https://gist.github.com/ogrisel/a2b0e5ae4987a398caa7f9277cb3b90a
The meat lies in the following reducer:
This works as expected (no extra copy when dumping and loading) and also fixes the in-memory speed overhead reported in by @mrocklin in #7544.
To get this in numpy, we would need to make a protocol-aware reduce function that is, have
ndarray
implement a__reduce_ex__
method that accepts aprotocol
argument instead of the existingbytes
-based implementation fromarray_reduce
in https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/methods.c#L1577. This bytes-based implementation should probably be kept as a fallback whenprotocol < 5
.The text was updated successfully, but these errors were encountered: