Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose tracemalloc hook into _Py_NewReference for other tracers #93502

Open
pablogsal opened this issue Jun 5, 2022 · 14 comments
Open

Expose tracemalloc hook into _Py_NewReference for other tracers #93502

pablogsal opened this issue Jun 5, 2022 · 14 comments
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@pablogsal
Copy link
Member

pablogsal commented Jun 5, 2022

When new objects are created, the allocation code ends calling into _Py_NewReference (with some exceptions). tracemalloc has a hook into this function to correlate allocations and traces (that do not know anything about objects or references) with newly created objects. This happens here:

cpython/Objects/object.c

Lines 2018 to 2020 in 3d647e7

if (_Py_tracemalloc_config.tracing) {
_PyTraceMalloc_NewReference(op);
}

This allows tracemalloc to be able to return the traceback where objects where allocated, which is great. Unfortunately other debuggers and profilers do not have this capability as there is no way to leverage trace malloc functionality or to hook into object creation.

I propose to expose an API (unclear what layer of the C-API this should be in) where callbacks can be registered to be called in _Py_NewReference and switch tracemalloc to use this API.

Linked PRs

@pablogsal pablogsal added type-feature A feature request or enhancement 3.12 bugs and security fixes labels Jun 5, 2022
@pablogsal
Copy link
Member Author

pablogsal commented Jun 5, 2022

CC: @vstinner any objection? What do you think?

@vstinner
Copy link
Member

vstinner commented Jun 6, 2022

I added _PyTraceMalloc_NewReference() to fix an issue with free lists: commit 9e00e80

commit 9e00e80e213ebc37eff89ce72102c1f928ebc133
Author: Victor Stinner <[email protected]>
Date:   Thu Oct 25 13:31:16 2018 +0200

    bpo-35053: Enhance tracemalloc to trace free lists (GH-10063)
    
    tracemalloc now tries to update the traceback when an object is
    reused from a "free list" (optimization for faster object creation,
    used by the builtin list type for example).
    
    Changes:
    
    * Add _PyTraceMalloc_NewReference() function which tries to update
      the Python traceback of a Python object.
    * _Py_NewReference() now calls _PyTraceMalloc_NewReference().
    * Add an unit test.

Issue: #79234

I propose to expose an API (unclear what layer of the C-API this should be in) where callbacks can be registered to be called in _Py_NewReference and switch tracemalloc to use this API.

To have a list of callbacks? As soon as the cost on performance is acceptable, I'm fine with it.

When tracemalloc was also provided as a 3rd party module, I had the same issue: it couldn't handle free lists (hook into _Py_NewReference).

_Py_NewReference() API is not well defined. Sometimes, it's used on a newly allocated memory block. Sometimes, it's used to update an object taken from a free list. Do you think that we should have two different APIs for these two code paths?

I dislike _PyTraceMalloc_NewReference() since it uses an if to the hot path of code creating an object from a free list. I chose correctness over performance.

When new objects are created, the allocation code ends calling into _Py_NewReference (with some exceptions)

Do you have to fix these exceptions to make the code more consistent?


By the way, I tried but failed to remove the private _Py_NewReference() function from the C API: #85161 It's used in a few 3rd party projects.

@iritkatriel iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 29, 2023
@wangxiang-hz
Copy link
Contributor

@vstinner Sorry for bothering you, I also have a similar requirement, where I want to perform some custom checks when _PyTraceMalloc_NewReference is called. My requirement is as follows:

In our C++ layer, we are adding some Python interfaces for scripting purposes. However, there is always a chance of accidentally missing Py_DECREF, which can lead to memory leaks. For example:

PyObject* name = PyObject_GetAttrString(obj, "obj_name");
Py_RETURN_NONE;

I forget to release name, it will result in a memory leak. If I discover a leak, I can track it down using take_snapshot and compare_to from tracemalloc. However, usually when we discover the leak, a lot of code has been executed, making it difficult to pinpoint the exact location. But keeping tracemalloc running continuously with PyFrame parsing would result in performance loss.

So, I'm wondering if it's possible to do something like registering a PyObj when _PyTraceMalloc_NewReference is called and unregistering it when tracemalloc_free is called. This way, when I'm preparing to exit my program, I will know which PyObj are still held. Once I find them, I can enable tracemalloc at the corresponding locations.

To achieve this, there are two issues that need to be resolved:

  1. How to associate the PyObj passed to '_PyTraceMalloc_NewReference' with the ptr passed to tracemalloc_free. Based on my code inspection, it seems feasible.
    PyTypeObject *type = Py_TYPE(op);
    const size_t presize = _PyType_PreHeaderSize(type);
    uintptr_t ptr = (uintptr_t)((char *)op - presize);
  1. Currently, unlike tracemalloc_free, there is no easy way to register my own function pointer in _PyTraceMalloc_NewReference. Is there any way to modify this?

Thanks.

@vstinner
Copy link
Member

Currently, unlike tracemalloc_free, there is no easy way to register my own function pointer in _PyTraceMalloc_NewReference. Is there any way to modify this?

Can you propose an API for that?

@wangxiang-hz
Copy link
Contributor

Currently, unlike tracemalloc_free, there is no easy way to register my own function pointer in _PyTraceMalloc_NewReference. Is there any way to modify this?

Can you propose an API for that?

With pleasure. I have created a new topic on Discuss, is that okay:
https://discuss.python.org/t/can-we-make-pytracemalloc-newreference-to-support-custom-hooks/41443

Thanks.

@vstinner
Copy link
Member

@erlend-aasland erlend-aasland added 3.13 bugs and security fixes and removed 3.12 bugs and security fixes labels Jan 5, 2024
@wangxiang-hz
Copy link
Contributor

Correct link: https://discuss.python.org/t/can-we-make-pytracemalloc-newreference-to-support-custom-hooks/41443

I'm sorry, I don't know why the link I provided cannot be opened...
I have made some modifications on my branch and it seems feasible:
wangxiang-hz@064be8e

However, I noticed that tracemalloc.h might have forgotten to define extern "C".

@pablogsal
Copy link
Member Author

I have created a PR with an API proposal for this: #115945

pablogsal added a commit to pablogsal/cpython that referenced this issue Feb 26, 2024
pablogsal added a commit to pablogsal/cpython that referenced this issue Feb 26, 2024
pablogsal added a commit to pablogsal/cpython that referenced this issue Feb 26, 2024
@wangxiang-hz
Copy link
Contributor

I have created a PR with an API proposal for this: #115945

Maybe we can rename _PyTraceMalloc_NewReference? Because whether it's a new reference for a PyObject or delete a PyObject, this function will be called, but with different events. This name feels a bit inaccurate.

pablogsal added a commit to pablogsal/cpython that referenced this issue Feb 28, 2024
pablogsal added a commit to pablogsal/cpython that referenced this issue Feb 28, 2024
pablogsal added a commit to pablogsal/cpython that referenced this issue Feb 28, 2024
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
@markshannon
Copy link
Member

Did anyone benchmark this?

This is a lot of code to add to a very hot code path. This impacts the deallocation of every object.

@pablogsal
Copy link
Member Author

pablogsal commented May 30, 2024

This is not doing anything that tracemalloc was not doing already for new references: it only exposes it to other tools. In all cases, the hook is only called if is set so it won't impact normal applications and the branch itself it's very predictable.

I ran pyperformance and didn't measure anything over the noise but that was long ago.

@markshannon
Copy link
Member

markshannon commented May 31, 2024

This is not doing anything that tracemalloc was not doing already for new references

It is doing something that wasn't done before, otherwise there wouldn't be any new code 🙂

I ran pyperformance and didn't measure anything over the noise but that was long ago.

Getting numbers for these sort of changes is hard as they tend to be in the noise.

It is not the small slowdown of small changes like this that worries me, it is the general trend of adding lots of ad hoc profiling hooks. Each of which are cheap, but cumulatively are expensive.

This hook makes it hard to inline the deallocation in the future, as there is an extra step which cannot easily be removed.

If we strip out all the debugging code, Py_Dealloc was a couple of indirections and a call:

void
_Py_Dealloc(PyObject *op)
{
    PyTypeObject *type = Py_TYPE(op);
    destructor dealloc = type->tp_dealloc;
    (*dealloc)(op);
}

now it is:

void
_Py_Dealloc(PyObject *op)
{
    PyTypeObject *type = Py_TYPE(op);
    destructor dealloc = type->tp_dealloc;
    struct _reftracer_runtime_state *tracer = &_PyRuntime.ref_tracer;
    if (tracer->tracer_func != NULL) {
        void* data = tracer->tracer_data;
        tracer->tracer_func(op, PyRefTracer_DESTROY, data);
    }
    (*dealloc)(op);
}

@pablogsal
Copy link
Member Author

pablogsal commented May 31, 2024

If we strip out all the debugging code, Py_Dealloc was a couple of indirections and a call:

Yeah in 99% of the runs the branch

    if (tracer->tracer_func != NULL) {

will not be taken so it will be predicted and the impact should be negligible.

It is not the small slowdown of small changes like this that worries me, it is the general trend of adding lots of ad hoc profiling hooks. Each of which are cheap, but cumulatively are expensive.

I understand this concern. Notice this functionality was used internally by tracemalloc already, we are just allowing other callers to do stuff with it. But again, I understand where you are coming from so what would be your recommendation to do here then?

@markshannon
Copy link
Member

The general approach for adding hooks, without impacting performance, is to move the hooks to the slow path and arrange for the slow path to be taken when the hook is needed.

Take the eval breaker, for example. We need to check it, but we don't need all 32 or 64 bits to show that we need to switch threads. We use the other bits to note that we need to perform GC, or instrument the bytecode, or handle an async exception, etc.

Exactly what to do here isn't immediately obvious. It depends on what the fast path is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants