Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_typing leaked [1, 1, 1] memory blocks: fail randomly on Refleak buildbots #121084

Closed
vstinner opened this issue Jun 27, 2024 · 28 comments
Closed
Labels
tests Tests in the Lib/test dir topic-typing

Comments

@vstinner vstinner added the tests Tests in the Lib/test dir label Jun 27, 2024
@sobolevn sobolevn self-assigned this Jun 27, 2024
@sobolevn
Copy link
Member

It does not reproduce on macos, so it is windows-specific.
I don't have access to Windows machince, so I cannot help with this :(

@sobolevn sobolevn removed their assignment Jun 27, 2024
@vstinner
Copy link
Member Author

./python -m test test_typing -R 3:3 result is very random. Example of 3 runs on Linux:

  • test_typing leaked [3, 0, 0] memory blocks, sum=3 (this is fine)
  • test_typing leaked [2, 1, 0] memory blocks, sum=3 (this is fine)
  • test_typing leaked [1, 0, 1] memory blocks, sum=2 (this is fine)

@vstinner
Copy link
Member Author

Even if I run more iterations, I only see a total of sum=3. "Leaks" are only in the first 5 runs (after the 3 warmups):

./python -m test test_typing -R 3:20:

  • test_typing leaked [2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] memory blocks, sum=3 (this is fine)
  • test_typing leaked [0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] memory blocks, sum=3 (this is fine)
  • test_typing leaked [1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] memory blocks, sum=3 (this is fine)

@vstinner
Copy link
Member Author

vstinner commented Jun 27, 2024

I tried tracemalloc:

diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py
index 25e71305402..5da93117945 100644
--- a/Lib/test/libregrtest/refleak.py
+++ b/Lib/test/libregrtest/refleak.py
@@ -120,6 +120,7 @@ def get_pooled_int(value):
         numbers = numbers[:warmups] + ':' + numbers[warmups:]
         print(numbers, file=sys.stderr, flush=True)
 
+    import tracemalloc
     xml_filename = 'refleak-xml.tmp'
     result = None
     dash_R_cleanup(fs, ps, pic, zdc, abcs)
@@ -137,6 +138,11 @@ def get_pooled_int(value):
         dash_R_cleanup(fs, ps, pic, zdc, abcs)
         support.gc_collect()
 
+        if i == 2:
+            tracemalloc.take_snapshot().dump('/tmp/snapshot.before')
+        elif i == 3:
+            tracemalloc.take_snapshot().dump('/tmp/snapshot.after')
+
         # Read memory statistics immediately after the garbage collection.
         # Also, readjust the reference counts and alloc blocks by ignoring
         # any strings that might have been interned during test_func. These

Script:

import tracemalloc
snapshot1 = tracemalloc.Snapshot.load('/tmp/snapshot.before')
snapshot2 = tracemalloc.Snapshot.load('/tmp/snapshot.after')
top_stats = snapshot2.compare_to(snapshot1, 'lineno')

print("[ Top 10 differences ]")
for stat in top_stats[:10]:
    print(stat)

Run tests: ./python -X tracemalloc -m test test_typing -R 3:3.

Example of script output:

unittest/result.py:170: size=25.0 KiB (-17.9 KiB), count=350 (-327), average=73 B
test/test_typing.py:1169: size=9240 B (+9240 B), count=1 (+1), average=9240 B
collections/__init__.py:510: size=52.6 KiB (-4672 B), count=259 (-2), average=208 B
typing.py:3000: size=5656 B (+4624 B), count=19 (+1), average=298 B
test/test_typing.py:7002: size=0 B (-4624 B), count=0 (-1)
enum.py:559: size=46.1 KiB (+1104 B), count=144 (+1), average=328 B
enum.py:496: size=44.9 KiB (-568 B), count=152 (-1), average=302 B
test/test_typing.py:8101: size=568 B (+568 B), count=1 (+1), average=568 B
test/test_typing.py:5595: size=568 B (+568 B), count=1 (+1), average=568 B
test/test_typing.py:10258: size=0 B (-568 B), count=0 (-1)

Only growing counts are interesting:

  • test/test_typing.py:1169: count=1 (+1) -- class Old(Generic[*Ts]): ...
  • typing.py:3000: count=19 (+1) -- nm_tpl.__bases__ = bases
  • enum.py:559: count=144 (+1) -- enum_class = super().__new__(metacls, cls, bases, classdict, **kwds)
  • test/test_typing.py:8101: count=1 (+1) -- class Meta(type):
  • test/test_typing.py:5595: count=1 (+1) -- class D(type(ClassVar[int])):

Many creation of classes. The question is why these classes still seem to be alive after the test completed.

Maybe something is delaying the deletion of these temporary classes. Maybe they are registered in caches and caches are not fully cleared "immediately" but "later".

@JelleZijlstra
Copy link
Member

I recall @encukou telling me it had something to do with materialization of __dict__.

@AlexWaygood
Copy link
Member

I recall @encukou telling me it had something to do with materialization of __dict__.

see #115822

@encukou
Copy link
Member

encukou commented Jun 27, 2024

This is a different pattern than the one related to dict rematerialization (#115822)

@picnixz
Copy link
Contributor

picnixz commented Jun 27, 2024

I don't know if its helpful but test.test_typing.ProtocolTests.test_protocols_isinstance_py36 seems to leak quite consistently just after the warm-up. If you write something like:

for i in {1..5} ; do ./python -m test test_typing -R3:4 -m test_protocols_isinstance_py36 ; done

You'll see patterns of the form

# 123:4567
  XX* *...
# or
  XX. *...

(The test invokes isinstance(X, Y) with Y a Protocol class).

In addition, we seem to have some pattern:

Either it is

beginning 43 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:4567890123456789012345678901234567890123
XX1 2.......................................
test_typing leaked [2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] memory blocks, sum=2 (this is fine)

or it is

123:4567890123456789012345678901234567890123
XX1 1....1..................................
test_typing leaked [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] memory blocks, sum=2 (this is fine)

(either you have a 2 at position 4 and nothing afterwards or a 1 at 4 and a 1 at 9).

@vstinner
Copy link
Member Author

GenericAlias and Union have no clear() function, but it doesn't seem to make any difference to add them:

diff --git a/Objects/genericaliasobject.c b/Objects/genericaliasobject.c
index f5fefd65653..59ef57acc4e 100644
--- a/Objects/genericaliasobject.c
+++ b/Objects/genericaliasobject.c
@@ -50,6 +50,16 @@ ga_traverse(PyObject *self, visitproc visit, void *arg)
     return 0;
 }
 
+static int
+ga_clear(PyObject *self)
+{
+    gaobject *alias = (gaobject *)self;
+    Py_CLEAR(alias->origin);
+    Py_CLEAR(alias->args);
+    Py_CLEAR(alias->parameters);
+    return 0;
+}
+
 static int
 ga_repr_item(PyUnicodeWriter *writer, PyObject *p)
 {
@@ -974,6 +984,7 @@ PyTypeObject Py_GenericAliasType = {
     .tp_getattro = ga_getattro,
     .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_VECTORCALL,
     .tp_traverse = ga_traverse,
+    .tp_clear = ga_clear,
     .tp_richcompare = ga_richcompare,
     .tp_weaklistoffset = offsetof(gaobject, weakreflist),
     .tp_methods = ga_methods,
diff --git a/Objects/unionobject.c b/Objects/unionobject.c
index 7931f4345f7..192ba4ef1ed 100644
--- a/Objects/unionobject.c
+++ b/Objects/unionobject.c
@@ -36,6 +36,15 @@ union_traverse(PyObject *self, visitproc visit, void *arg)
     return 0;
 }
 
+static int
+union_clear(PyObject *self)
+{
+    unionobject *alias = (unionobject *)self;
+    Py_CLEAR(alias->args);
+    Py_CLEAR(alias->parameters);
+    return 0;
+}
+
 static Py_hash_t
 union_hash(PyObject *self)
 {
@@ -380,6 +389,7 @@ PyTypeObject _PyUnion_Type = {
     .tp_free = PyObject_GC_Del,
     .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,
     .tp_traverse = union_traverse,
+    .tp_clear = union_clear,
     .tp_hash = union_hash,
     .tp_getattro = union_getattro,
     .tp_members = union_members,

@vstinner
Copy link
Member Author

test_functools has a similar issue: #120163

@JelleZijlstra
Copy link
Member

GenericAlias and Union have no clear() function

If I understand correctly, they don't need clear() because it is impossible for a cycle to exist that contains exclusively GenericAlias or Union objects, because both are immutable.

@vstinner
Copy link
Member Author

vstinner commented Jun 28, 2024

About test_functools, I can reproduce a "leak" of 1 memory allocation with this simplified test suite. It's a random failure, like 1 run on 50 runs.

According to tracemalloc, there are 2 "leaks":

  • class P: pass
  • c.Iterable.register(Q) which calls _abc._abc_register()

I'm not sure why this P class is not deallocated immediately.

collections.abc.Iterable.register(P) and collections.abc.Container.register(P) should only store a weak reference to the class.

I'm not 100% sure of tracemalloc result.

import collections.abc
import functools
import unittest

class TestSingleDispatch(unittest.TestCase):
    def test_mro_conflicts(self):
        c = collections.abc

        @functools.singledispatch
        def g(arg):
            return "base"

        class O(c.Sized):
            def __len__(self):
                return 0
        o = O()

        g.register(c.Iterable, lambda arg: "iterable")
        g.register(c.Container, lambda arg: "container")
        g.register(c.Sized, lambda arg: "sized")
        g.register(c.Set, lambda arg: "set")

        c.Iterable.register(O)
        c.Container.register(O)
        c.Set.register(O)

        class P:  # <========== HERE ============
            pass
        p = P()
        c.Iterable.register(P)
        c.Container.register(P)

        with self.assertRaises(RuntimeError) as re_one:
            g(p)

        class Q(c.Sized):
            def __len__(self):
                return 0
        q = Q()
        c.Iterable.register(Q)  # <========== HERE ============
        c.Set.register(Q)

@AlexWaygood
Copy link
Member

AlexWaygood commented Jun 28, 2024

I'm not sure why this P class is not deallocated immediately.

Does it reproduce if you get rid of the assertRaises calls from the test? I've experienced issues in the past where these calls have kept local variables alive for longer than the lifetime of the test method due to reference cycles somewhere. If the garbage collector doesn't collect soon enough, that could cause the P class to persist for longer than the lifetime of the test function, causing a "leak" to be reported.

@picnixz

This comment was marked as off-topic.

@vstinner
Copy link
Member Author

I tried commenting out some assert calls and when I did it, I did not observe the same leakage.

test_typing and test_functools "leaks" are very random. I'm using a loop to run the test until it fails. Sometimes, it fails at the first attempt. Sometimes, it takes 100 runs.

@vstinner
Copy link
Member Author

@AlexWaygood:

Does it reproduce if you get rid of the assertRaises calls from the test?

Yes, sadly, I can still reproduce the "leak" if I remove assertRaises().

And you're right that storing an exception in a local variable creates a reference cycle!

@vstinner
Copy link
Member Author

vstinner commented Jun 28, 2024

This "leak" is really random, I'm not sure that it's a real leak. It's really hard to debug, I'm considering to give up for my mental health sanity :-)

Maybe libregrtest should just ignore "leaks" of [1, 0, 0] memory blocks, [1, 1, 1] memory blocks, etc. Maybe only complain if the total is greater than 3? Or maybe even remove the "memory block" feature? I'm not sure. Another option is to ignore it only for "test_functools" and "test_typing".

@Eclips4
Copy link
Member

Eclips4 commented Jun 29, 2024

I often see this issue when running the test suite on my Mac OS & Linux setups. So it's not specific to Windows.

@sobolevn
Copy link
Member

FWIW, I've executed while true; do ./python.exe -m test test_typing -R : || exit 1; done on m2 macos for like 30 minutes. And it worked just fine.

@vstinner
Copy link
Member Author

vstinner commented Jun 30, 2024

You can make the problem more likely with this change:

diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py
index 25e71305402..571d4cfeb22 100644
--- a/Lib/test/libregrtest/refleak.py
+++ b/Lib/test/libregrtest/refleak.py
@@ -201,7 +201,7 @@ def check_fd_deltas(deltas):
     failed = False
     for deltas, item_name, checker in [
         (rc_deltas, 'references', check_rc_deltas),
-        (alloc_deltas, 'memory blocks', check_rc_deltas),
+        (alloc_deltas, 'memory blocks', lambda deltas: any(delta >= 1 for delta in deltas)),
         (fd_deltas, 'file descriptors', check_fd_deltas)
     ]:
         # ignore warmup runs

@vstinner
Copy link
Member Author

vstinner commented Jun 30, 2024

Here is a reproducer of test_functools leak in less than 120 lines without unittest or anything: leak.py.

Run the script in a loop until it fails with the shell command:

while true; do ./python leak.py || break; done

Example of output:

(...)
.
.
.
.
.
.
.
test_name leaked [1, 0, 0] memory blocks, sum=1
REFLEAK!

@vstinner
Copy link
Member Author

vstinner commented Jun 30, 2024

IMO the code to clear Abstract Base Classes is not reliable:

def runtest_refleak(test_name, test_func):
    # (...)
    abcs = {}
    for abc in [getattr(collections.abc, a) for a in collections.abc.__all__]:
        if not isabstract(abc):
            continue
        for obj in abc.__subclasses__() + [abc]:
            abcs[obj] = _get_dump(obj)[0]
    # (...)

def dash_R_cleanup(abcs):
    abs_classes = [getattr(collections.abc, a) for a in collections.abc.__all__]
    abs_classes = filter(isabstract, abs_classes)
    for abc in abs_classes:
        for obj in abc.__subclasses__() + [abc]:
            for ref in abcs.get(obj, set()):
                if ref() is not None:
                    obj.register(ref())
            obj._abc_caches_clear()

@vstinner
Copy link
Member Author

According to tracemalloc, the only additional memory allocation is:

/home/vstinner/python/main/leak.py:124: size=256 B (+256 B), count=1 (+1), average=256 B
  File "/home/vstinner/python/main/leak.py", line 124
    main()
  File "/home/vstinner/python/main/leak.py", line 117
    refleak = runtest_refleak("test_name", test_mro_conflicts)
  File "/home/vstinner/python/main/leak.py", line 76
    test_func()
  File "/home/vstinner/python/main/leak.py", line 40
    c.Iterable.register(Q)
  File "/home/vstinner/python/main/Lib/abc.py", line 115
    return _abc_register(cls, subclass)

vstinner added a commit to vstinner/cpython that referenced this issue Jun 30, 2024
dash_R_cleanup() now calls _abc_registry_clear() before calling again
register().
@vstinner
Copy link
Member Author

I wrote PR gh-121191 which seems to fix test_functools.

vstinner added a commit that referenced this issue Jul 1, 2024
dash_R_cleanup() now calls _abc_registry_clear() before calling again
register().
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 1, 2024
…ythonGH-121191)

dash_R_cleanup() now calls _abc_registry_clear() before calling again
register().
(cherry picked from commit c766ad2)

Co-authored-by: Victor Stinner <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 1, 2024
…ythonGH-121191)

dash_R_cleanup() now calls _abc_registry_clear() before calling again
register().
(cherry picked from commit c766ad2)

Co-authored-by: Victor Stinner <[email protected]>
vstinner added a commit that referenced this issue Jul 1, 2024
…H-121191) (#121209)

gh-121084: Call _abc_registry_clear() when checking refleaks (GH-121191)

dash_R_cleanup() now calls _abc_registry_clear() before calling again
register().
(cherry picked from commit c766ad2)

Co-authored-by: Victor Stinner <[email protected]>
@vstinner
Copy link
Member Author

vstinner commented Jul 1, 2024

Example of leak:

$ ./python -m test test_typing -R 3:3 -m test.test_typing.ProtocolTests.test_interaction_with_isinstance_checks_on_superclasses_with_ABCMeta_2 -m test.test_typing.ProtocolTests.test_supports_bytes
(...)
test_typing leaked [0, 1, 0] memory blocks, sum=1 (this is fine)

The problem is that the ABC cache is not cleared explicitly. Example of local fix to show the problem, but a more generic one should be written:

diff --git a/Lib/test/test_typing.py b/Lib/test/test_typing.py
index a931da55908..b2b22668e07 100644
--- a/Lib/test/test_typing.py
+++ b/Lib/test/test_typing.py
@@ -4137,6 +4137,8 @@ def __bytes__(self):
         self.assertIsSubclass(B, typing.SupportsBytes)
         self.assertNotIsSubclass(str, typing.SupportsBytes)
 
+        typing.SupportsBytes._abc_caches_clear()
+
     def test_supports_abs(self):
         self.assertIsSubclass(float, typing.SupportsAbs)
         self.assertIsSubclass(int, typing.SupportsAbs)

vstinner added a commit that referenced this issue Jul 1, 2024
…H-121191) (#121208)

gh-121084: Call _abc_registry_clear() when checking refleaks (GH-121191)

dash_R_cleanup() now calls _abc_registry_clear() before calling again
register().
(cherry picked from commit c766ad2)

Co-authored-by: Victor Stinner <[email protected]>
@vstinner
Copy link
Member Author

vstinner commented Jul 1, 2024

More generic fix:

diff --git a/Lib/test/libregrtest/utils.py b/Lib/test/libregrtest/utils.py
index 8253d330b95..056fc9ba919 100644
--- a/Lib/test/libregrtest/utils.py
+++ b/Lib/test/libregrtest/utils.py
@@ -263,6 +263,13 @@ def clear_caches():
         for f in typing._cleanups:
             f()
 
+        import inspect
+        abs_classes = [getattr(typing, attr) for attr in typing.__all__]
+        abs_classes = filter(inspect.isabstract, abs_classes)
+        for abc in abs_classes:
+            for obj in abc.__subclasses__() + [abc]:
+                obj._abc_caches_clear()
+
     try:
         fractions = sys.modules['fractions']
     except KeyError:

Akasurde pushed a commit to Akasurde/cpython that referenced this issue Jul 3, 2024
…ython#121191)

dash_R_cleanup() now calls _abc_registry_clear() before calling again
register().
vstinner added a commit to vstinner/cpython that referenced this issue Jul 4, 2024
Call _abc_caches_clear() on typing abstract classes and their
subclasses when running tests to check for refleaks (-R option).
vstinner added a commit to vstinner/cpython that referenced this issue Jul 4, 2024
Clear typing ABC caches when running tests for refleaks (-R option):
call _abc_caches_clear() on typing abstract classes and their
subclasses.
vstinner added a commit that referenced this issue Jul 4, 2024
Clear typing ABC caches when running tests for refleaks (-R option):
call _abc_caches_clear() on typing abstract classes and their
subclasses.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 4, 2024
Clear typing ABC caches when running tests for refleaks (-R option):
call _abc_caches_clear() on typing abstract classes and their
subclasses.
(cherry picked from commit 5f660e8)

Co-authored-by: Victor Stinner <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 4, 2024
Clear typing ABC caches when running tests for refleaks (-R option):
call _abc_caches_clear() on typing abstract classes and their
subclasses.
(cherry picked from commit 5f660e8)

Co-authored-by: Victor Stinner <[email protected]>
@vstinner
Copy link
Member Author

vstinner commented Jul 4, 2024

test_typing was fixed by 5f660e8.

@vstinner vstinner closed this as completed Jul 4, 2024
vstinner added a commit that referenced this issue Jul 4, 2024
gh-121084: Fix test_typing random leaks (GH-121360)

Clear typing ABC caches when running tests for refleaks (-R option):
call _abc_caches_clear() on typing abstract classes and their
subclasses.
(cherry picked from commit 5f660e8)

Co-authored-by: Victor Stinner <[email protected]>
vstinner added a commit that referenced this issue Jul 4, 2024
gh-121084: Fix test_typing random leaks (GH-121360)

Clear typing ABC caches when running tests for refleaks (-R option):
call _abc_caches_clear() on typing abstract classes and their
subclasses.
(cherry picked from commit 5f660e8)

Co-authored-by: Victor Stinner <[email protected]>
@Eclips4
Copy link
Member

Eclips4 commented Jul 4, 2024

Thanks Victor for solving this. I was a little late to the "party", but I can confirm that it solves the leak at least in every one of my setups (Linux, Windows, MacOS).

noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…ython#121191)

dash_R_cleanup() now calls _abc_registry_clear() before calling again
register().
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
Clear typing ABC caches when running tests for refleaks (-R option):
call _abc_caches_clear() on typing abstract classes and their
subclasses.
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…ython#121191)

dash_R_cleanup() now calls _abc_registry_clear() before calling again
register().
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
Clear typing ABC caches when running tests for refleaks (-R option):
call _abc_caches_clear() on typing abstract classes and their
subclasses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir topic-typing
Projects
None yet
Development

No branches or pull requests

7 participants