Leverage the new PEP 574 for no-copy pickling of contiguous arrays #11161

ogrisel · 2018-05-25T16:40:16Z

PEP 574 (scheduled for Python 3.8) introduces pickle protocol 5 with support for no-copy pickling of large mutable buffers.

I made a small proof-of-concept benchmark script using @pitrou's pickle5 backport of his draft implementation of PEP 547.

See: https://gist.github.com/ogrisel/a2b0e5ae4987a398caa7f9277cb3b90a

The meat lies in the following reducer:

from pickle5 import PickleBuffer

def _array_from_buffer(buffer, dtype, shape):
    return np.frombuffer(buffer, dtype=dtype).reshape(shape)


def reduce_ndarray_pickle5(a):
    # This reducer assumes protocol 5 as currently there is no way to register
    # protocol-aware reduce function in the global copyreg dispatch table.
    if not a.dtype.hasobject and a.flags.c_contiguous:
        # No-copy pickling for C-contiguous arrays and protocol 5
        return _array_from_buffer, (PickleBuffer(a), a.dtype, a.shape), None
    else:
        # Fall-back to generic method
        return a.__reduce__()

This works as expected (no extra copy when dumping and loading) and also fixes the in-memory speed overhead reported in by @mrocklin in #7544.

To get this in numpy, we would need to make a protocol-aware reduce function that is, have ndarray implement a __reduce_ex__ method that accepts a protocol argument instead of the existing bytes-based implementation from array_reduce in https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/methods.c#L1577. This bytes-based implementation should probably be kept as a fallback when protocol < 5.

The text was updated successfully, but these errors were encountered:

ogrisel · 2018-05-25T16:48:41Z

Also, my naive reducer only works if the array is C-contiguous. Adding support for F-contiguous arrays should be easy by taking a transposed view. I am not sure about pickling non-contiguous arrays/views.

pitrou · 2018-05-25T16:50:50Z

Your example is not zero-copy as it doesn't pass a buffer_callback argument to dump. But it probably makes one less copy than with protocol 4 :-)

ogrisel · 2018-05-25T16:55:45Z

Well it does stream the content into the file object without making any spurious copy. This is checked with the linked gist script. There is a sample output of the script in the gist comments. It's in-band zero copy :)

The buffer_callback feature of PEP 574 would further make it possible to do out-of-band zero copy communication which would be very useful for safe asyncio-based communication over the network: we could start sending data only once we are sure that a compound Python object is picklable. That requires some extra wrapping protocol around the pickle bytes themselves though.

pitrou · 2018-05-25T16:58:17Z

To clarify: an example (trivial use) of zero-copy is as follows:

buffers = []
p.dump(data, buffer_callback=buffers.extend)

# Later, to recreate array
obj = pickle.loads(data, buffers=buffers)

In a real use case, the buffers could be shipped separately from one process (or subinterpreter - see PEP 554) to another, or shared using shared memory, etc.

pierreglaser · 2018-09-21T13:29:12Z

hi @ogrisel @pitrou
To carry on with this issue, I implemented __reduce_ex__ for numpy arrays using the C Python API in a branch of my fork, and used @pitrou 's python fork to use pickle protocol 5, and a np._frombuffer that reconstructs an array given a buffer, its dtype and its shape
I traced the peak memory when pickling/unpicking (with airspeed velocity), using

pickle protocol 4
pickle protocol 5
pickle protocol 5 with buffer_callback

Each one of them for different shapes.

Here is what we get:

For the biggest shape, ((10000,1000)), the size of the array is 80MB. We do see a 160MB decrease in peak memory between pickle protocol 4 and pickle protocol 5 with callbacks, so 2 copies less as you say :)

There seems to be also a big memory increase between pickle protocol 4 and pickle protocol 5 with no buffer callback. I cannot explain it for now, will investigate.

What do you guys think?

pitrou · 2018-09-21T14:33:24Z

Thank you very much @pierreglaser. Did you measure CPU times as well?

I cannot explain why protocol 5 without callbacks would exhibit larger memory consumption. It may be a problem in how you implemented __reduce_ex__. By the way, how did you measure memory consumption? Is it RSS or virtual memory size?

ogrisel · 2018-09-21T14:42:01Z

RSS with psutil I guess.

hmaarrfk · 2018-09-21T14:45:22Z

@pierreglaser did you try to subtract the baseline memory cost? If so, how did you do it?

Edit: added pierreglaser's handle instead of ogrisel. :/ man I'm 0/1.

ogrisel · 2018-09-21T14:47:23Z

@pierreglaser did the work. He uses airspeed velocity to do the measurements. I am not sure if the baseline is substracted or not. Probably not.

ogrisel · 2018-09-21T14:51:16Z

@pierreglaser I suppose this is your branch: master...pierreglaser:implement-reduce-ex

There is a problem: you use Python 3.8 -specific C-API, notably the PyPickleBuffer_FromObject function. We don't want the numpy build to depend on Python 3.8. It should still be buildable with Python 2.7 and old versions of Python 3.

So instead of using PyPickleBuffer_FromObject you should build pickle buffers by importing the PickleBuffer class from the pickle module and then calling on the array (as you would do in Python). This way in the C code you only manipulate it as a PyObject and do not introduce a build dependency. I am not familiar enough with the C API but I am pretty sure this is possible.

ogrisel · 2018-09-21T14:52:00Z

Also maybe feel free to open a [WIP] pull-request against numpy master so that we can give you more detailed feedback there.

pierreglaser · 2018-09-21T14:55:44Z

@pitrou here is the updated figure with CPU time. Pretty low when using buffers.
@ogrisel got it, will take a look at this!

pitrou · 2018-09-21T14:58:30Z

So instead of using PyPickleBuffer_FromObject you should build pickle buffers by importing the PickleBuffer class from the pickle module and then calling on the array (as you would do in Python)

One could also use a preprocessor conditional block, something like:

#if PY_VERSION_HEX < 0x03080000
/* code for Python 3.7 and older */
#else
/* code for Python 3.8+ */
#endif

Things get a bit more complicated if you want to take into account the pickle5 backport, though, because you can't discriminate at compile-time, so you'd have to try importing that module and do as you suggest (treat the PickleBuffer class as a regular Python object).

pierreglaser · 2018-09-21T14:59:25Z

@hmaarrfk the first numpy array has a size of 80KB which is residual compared to the memory usage (20MB). So my guess would be 20MB as a rough estimate.

pitrou · 2018-09-21T15:06:09Z

By the way, both @ogrisel and I got better results with protocol 5 over protocol 4, even without a buffer callback:

here is @ogrisel's quick-and-dirty attempt for Numpy:
https://gist.github.com/ogrisel/a2b0e5ae4987a398caa7f9277cb3b90a
here is my PR for Apache Arrow:
ARROW-2660: [Python] Experimental zero-copy pickling apache/arrow#2161 (comment)

@pierreglaser Out of curiosity, can you post the code for your benchmark?

ogrisel · 2018-09-21T15:13:31Z

The solution with the preprocessor conditional block is simple but indeed it does not make it possible to benefit from the pickle5 package installed on Python 3.7. But maybe we don't care that much.

pierreglaser · 2018-09-21T15:20:46Z

@pitrou I pushed the repository with my benchmarks here :)

ogrisel · 2018-09-21T15:26:43Z

@pierreglaser what you call stream in your benchmark is not a stream, it's an in-memory bytes string. It would be more interesting to pickle to an out-of-RAM file object (e.g. an temporary file on the filesystem): use dump instead of dumps and load instead of loads.

This should make it easier to identify spurious copies. Also please use a payload that is significantly larger than the default size of the python process to make it easier to count the number of spurious copies. For instance a 200MB numpy array.

ogrisel · 2018-09-21T15:27:15Z

Or even a 1GB array.

ogrisel · 2018-09-21T15:29:55Z

Also I would rename Protocol5WithBuffer to Protocol5WithOutOfBandBuffers to make it more explicit. Protocol5 also uses PickleBuffer objects internally but they are serialized into the pickle bytes stream.

jakirkham · 2018-09-21T18:33:52Z

What are the units in your graphs, @pierreglaser? Am particularly looking at runtime. Has it been normalized somehow?

pierreglaser · 2018-09-24T16:22:27Z

Hi All,
So here is how I addressed this issue:

if python3.8, import pickle
if python3.7, try importing the pickle5 backport. If it is not available, return an error, because then protocol 5 is not available.

Here is the updated graph (with correct units @jakirkham ). The peak memory increase for pickle5 with no out of band buffer was a false alert.

(Note: time axis is in log scale)

pitrou · 2018-09-24T17:25:47Z

Thanks for the update :-)

ogrisel · 2018-10-11T09:20:43Z

Closing: #12011 was merged in numpy master.

Neltherion · 2021-09-26T13:36:56Z

@ogrisel Hi. I've asked a question about pickle's protocol5 inefficiency vs PyArrow's Serialization/Deserialization methods here.

Is there anything that can be done to utilize pickle to serialize/deserialize objects containing Numpy arrays faster?

ogrisel mentioned this issue Jun 4, 2018

[WIP] No-copy semantics for large memoryviews cloudpipe/cloudpickle#138

Closed

7 tasks

jakirkham mentioned this issue Aug 8, 2018

Pickle is significantly slower than a memory copy #7544

Closed

mattip mentioned this issue Aug 14, 2018

BUG: it is possible to set array to writeable even when the underlying buffer/memoryview is supposedly not. #9440

Closed

pierreglaser mentioned this issue Sep 21, 2018

ENH: implementation of array_reduce_ex #12011

Merged

ogrisel closed this as completed Oct 11, 2018

maxnoe mentioned this issue Dec 1, 2020

support python 3.8 shared memory? JelleAalbers/npshmex#1

Open

agoscinski mentioned this issue Mar 29, 2023

Implement Python's pickle protocol for TensorMap metatensor/metatensor#94

Closed

dlee992 mentioned this issue Jul 21, 2023

ENH: support no-copy pickling for non-contiguous arrays, e.g., recarray #24226

Closed

yangdong02 mentioned this issue Jul 6, 2024

ENH: support no-copy pickling for any array that can be transposed to a C-contiguous array #26878

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leverage the new PEP 574 for no-copy pickling of contiguous arrays #11161

Leverage the new PEP 574 for no-copy pickling of contiguous arrays #11161

ogrisel commented May 25, 2018 •

edited

Loading

ogrisel commented May 25, 2018

pitrou commented May 25, 2018

ogrisel commented May 25, 2018 •

edited

Loading

pitrou commented May 25, 2018

pierreglaser commented Sep 21, 2018

pitrou commented Sep 21, 2018

ogrisel commented Sep 21, 2018

hmaarrfk commented Sep 21, 2018 •

edited

Loading

ogrisel commented Sep 21, 2018

ogrisel commented Sep 21, 2018

ogrisel commented Sep 21, 2018 •

edited

Loading

pierreglaser commented Sep 21, 2018 •

edited

Loading

pitrou commented Sep 21, 2018

pierreglaser commented Sep 21, 2018

pitrou commented Sep 21, 2018

ogrisel commented Sep 21, 2018

pierreglaser commented Sep 21, 2018

ogrisel commented Sep 21, 2018

ogrisel commented Sep 21, 2018

ogrisel commented Sep 21, 2018

jakirkham commented Sep 21, 2018

pierreglaser commented Sep 24, 2018 •

edited

Loading

pitrou commented Sep 24, 2018

ogrisel commented Oct 11, 2018

Neltherion commented Sep 26, 2021 •

edited

Loading

Leverage the new PEP 574 for no-copy pickling of contiguous arrays #11161

Leverage the new PEP 574 for no-copy pickling of contiguous arrays #11161

Comments

ogrisel commented May 25, 2018 • edited Loading

ogrisel commented May 25, 2018

pitrou commented May 25, 2018

ogrisel commented May 25, 2018 • edited Loading

pitrou commented May 25, 2018

pierreglaser commented Sep 21, 2018

pitrou commented Sep 21, 2018

ogrisel commented Sep 21, 2018

hmaarrfk commented Sep 21, 2018 • edited Loading

ogrisel commented Sep 21, 2018

ogrisel commented Sep 21, 2018

ogrisel commented Sep 21, 2018 • edited Loading

pierreglaser commented Sep 21, 2018 • edited Loading

pitrou commented Sep 21, 2018

pierreglaser commented Sep 21, 2018

pitrou commented Sep 21, 2018

ogrisel commented Sep 21, 2018

pierreglaser commented Sep 21, 2018

ogrisel commented Sep 21, 2018

ogrisel commented Sep 21, 2018

ogrisel commented Sep 21, 2018

jakirkham commented Sep 21, 2018

pierreglaser commented Sep 24, 2018 • edited Loading

pitrou commented Sep 24, 2018

ogrisel commented Oct 11, 2018

Neltherion commented Sep 26, 2021 • edited Loading

ogrisel commented May 25, 2018 •

edited

Loading

ogrisel commented May 25, 2018 •

edited

Loading

hmaarrfk commented Sep 21, 2018 •

edited

Loading

ogrisel commented Sep 21, 2018 •

edited

Loading

pierreglaser commented Sep 21, 2018 •

edited

Loading

pierreglaser commented Sep 24, 2018 •

edited

Loading

Neltherion commented Sep 26, 2021 •

edited

Loading