-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose tracemalloc hook into _Py_NewReference
for other tracers
#93502
Comments
CC: @vstinner any objection? What do you think? |
I added _PyTraceMalloc_NewReference() to fix an issue with free lists: commit 9e00e80
Issue: #79234
To have a list of callbacks? As soon as the cost on performance is acceptable, I'm fine with it. When tracemalloc was also provided as a 3rd party module, I had the same issue: it couldn't handle free lists (hook into _Py_NewReference). _Py_NewReference() API is not well defined. Sometimes, it's used on a newly allocated memory block. Sometimes, it's used to update an object taken from a free list. Do you think that we should have two different APIs for these two code paths? I dislike _PyTraceMalloc_NewReference() since it uses an
Do you have to fix these exceptions to make the code more consistent? By the way, I tried but failed to remove the private _Py_NewReference() function from the C API: #85161 It's used in a few 3rd party projects. |
@vstinner Sorry for bothering you, I also have a similar requirement, where I want to perform some custom checks when In our C++ layer, we are adding some Python interfaces for scripting purposes. However, there is always a chance of accidentally missing
I forget to release So, I'm wondering if it's possible to do something like registering a To achieve this, there are two issues that need to be resolved:
Thanks. |
Can you propose an API for that? |
With pleasure. I have created a new topic on Discuss, is that okay: Thanks. |
I'm sorry, I don't know why the link I provided cannot be opened... However, I noticed that |
I have created a PR with an API proposal for this: #115945 |
…on and destruction
Maybe we can rename |
… creation and destruction
… object creation and destruction
Did anyone benchmark this? This is a lot of code to add to a very hot code path. This impacts the deallocation of every object. |
This is not doing anything that tracemalloc was not doing already for new references: it only exposes it to other tools. In all cases, the hook is only called if is set so it won't impact normal applications and the branch itself it's very predictable. I ran pyperformance and didn't measure anything over the noise but that was long ago. |
It is doing something that wasn't done before, otherwise there wouldn't be any new code 🙂
Getting numbers for these sort of changes is hard as they tend to be in the noise. It is not the small slowdown of small changes like this that worries me, it is the general trend of adding lots of ad hoc profiling hooks. Each of which are cheap, but cumulatively are expensive. This hook makes it hard to inline the deallocation in the future, as there is an extra step which cannot easily be removed. If we strip out all the debugging code, void
_Py_Dealloc(PyObject *op)
{
PyTypeObject *type = Py_TYPE(op);
destructor dealloc = type->tp_dealloc;
(*dealloc)(op);
} now it is: void
_Py_Dealloc(PyObject *op)
{
PyTypeObject *type = Py_TYPE(op);
destructor dealloc = type->tp_dealloc;
struct _reftracer_runtime_state *tracer = &_PyRuntime.ref_tracer;
if (tracer->tracer_func != NULL) {
void* data = tracer->tracer_data;
tracer->tracer_func(op, PyRefTracer_DESTROY, data);
}
(*dealloc)(op);
} |
Yeah in 99% of the runs the branch
will not be taken so it will be predicted and the impact should be negligible.
I understand this concern. Notice this functionality was used internally by tracemalloc already, we are just allowing other callers to do stuff with it. But again, I understand where you are coming from so what would be your recommendation to do here then? |
The general approach for adding hooks, without impacting performance, is to move the hooks to the slow path and arrange for the slow path to be taken when the hook is needed. Take the eval breaker, for example. We need to check it, but we don't need all 32 or 64 bits to show that we need to switch threads. We use the other bits to note that we need to perform GC, or instrument the bytecode, or handle an async exception, etc. Exactly what to do here isn't immediately obvious. It depends on what the fast path is. |
When new objects are created, the allocation code ends calling into
_Py_NewReference
(with some exceptions).tracemalloc
has a hook into this function to correlate allocations and traces (that do not know anything about objects or references) with newly created objects. This happens here:cpython/Objects/object.c
Lines 2018 to 2020 in 3d647e7
This allows
tracemalloc
to be able to return the traceback where objects where allocated, which is great. Unfortunately other debuggers and profilers do not have this capability as there is no way to leverage trace malloc functionality or to hook into object creation.I propose to expose an API (unclear what layer of the C-API this should be in) where callbacks can be registered to be called in
_Py_NewReference
and switch tracemalloc to use this API.Linked PRs
The text was updated successfully, but these errors were encountered: