TypeError: descriptor 'some_method' for 'A' objects doesn't apply to a 'B' object #121368

ambv · 2024-07-04T16:04:19Z

Bug report

I found a bug that seems to be code corruption.

While working on an example project with ~70 threads, I occasionally (once every hour or so) get the following exception from various locks:

Exception in thread Sequence 2:
Traceback (most recent call last):
  File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/threading.py", line 1039, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  ...
  File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/threading.py", line 656, in wait
    with self._cond:
         ^^^^^^^^^^
  File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/threading.py", line 304, in __enter__
    return self._lock.__enter__()
           ~~~~~~~~~~~~~~~~~~~~^^
TypeError: descriptor '__exit__' for '_thread.RLock' objects doesn't apply to a '_thread.lock' object

or

Exception in thread Clock:
Traceback (most recent call last):
  File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/threading.py", line 1039, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "europython.py", line 113, in run
    for message in input:
  File "/Volumes/RAMDisk/nogil-ep-temp/lib/python3.14/site-packages/mido/ports.py", line 243, in __iter__
    yield self.receive()
          ~~~~~~~~~~~~^^
  File "/Volumes/RAMDisk/nogil-ep-temp/lib/python3.14/site-packages/mido/ports.py", line 215, in receive
    with self._lock:
         ^^^^^^^^^^
TypeError: descriptor '__exit__' for '_thread.lock' objects doesn't apply to a '_thread.RLock' object

This looks like it's a bug related to locks but it isn't. It's not even related to descriptors, only descriptors nicely refuse running invalid code.

This issue is also externally reported in PyWavelets/pywt#758 with the same Lock descriptor error message I've seen, and I can reproduce the failure locally, albeit with a different exception:

TypeError: descriptor 'sort' for 'numpy.ndarray' objects doesn't apply to a 'ThreadPoolExecutor' object

To reproduce this with cpython main, do the following:

make a venv with a free-threaded build of Python
install Cython from main with pip install -e .
install Numpy from main with pip install . --no-build-isolation (important: no -e in this case)
install pywt from main with pip install -e . --no-build-isolation (important: you DO need -e in this case)
run pytest in a loop (or with autoclave) like this: PYTHON_GIL=0 pytest pywt/tests/test_concurrent.py

You will need to run this for a longer while to get to a failure.

By doing this, I managed to find this particular failure case:

self = <concurrent.futures.thread.ThreadPoolExecutor object at 0x225be124750>, fn = functools.partial(<function dwtn at 0x225be6d3b40>, wavelet='haar')
args = (array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., ...., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]]),)
kwargs = {}, f = <Future at 0x225bea44310 state=finished returned dict>, w = <concurrent.futures.thread._WorkItem object at 0x225bc875bf0>

    def submit(self, fn, /, *args, **kwargs):
        with self._shutdown_lock, _global_shutdown_lock:
            if self._broken:
                raise BrokenThreadPool(self._broken)

            if self._shutdown:
                raise RuntimeError('cannot schedule new futures after shutdown')
            if _shutdown:
                raise RuntimeError('cannot schedule new futures after '
                                   'interpreter shutdown')

            f = _base.Future()
            w = _WorkItem(f, fn, args, kwargs)

            self._work_queue.put(w)
>           self._adjust_thread_count()
E           TypeError: Future.set_result() missing 1 required positional argument: 'result'

args       = (array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., ...., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]]),)
f          = <Future at 0x225bea44310 state=finished returned dict>
fn         = functools.partial(<function dwtn at 0x225be6d3b40>, wavelet='haar')
kwargs     = {}
self       = <concurrent.futures.thread.ThreadPoolExecutor object at 0x225be124750>
w          = <concurrent.futures.thread._WorkItem object at 0x225bc875bf0>

../installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/concurrent/futures/thread.py:179: TypeError
======================================================================================================== short test summary info ========================================================================================================
FAILED pywt/tests/test_concurrent.py::test_concurrent_dwt - TypeError: Future.set_result() missing 1 required positional argument: 'result'
====================================================================================================== 1 failed, 3 passed in 0.44s ======================================================================================================
-- 863 runs, 862 passes, 1 failure, 734486 msec

Observe how Python wants to call self._adjust_thread_count() (with no arguments) but ends up calling f.set_result(), which causes an exception due to no arguments being passed.

Tested on macOS Sonoma on M1 Max with Python 3.14.0a0 experimental free-threading build (heads/main:7a807c3efaa, Jul 2 2024, 11:58:38).

AFAICT the problem only occurs with the GIL actually disabled.

Linked PRs

The text was updated successfully, but these errors were encountered:

ngoldbaum · 2024-07-04T16:21:51Z

By the way I hit this last week with 3.13.0b2 so this is likely a problem on the 3.13 branch as well and not something new in 3.14.

ambv · 2024-07-04T18:09:34Z

Another interesting failure from running the pywt tests that shows this is beyond descriptors (despite the exception):

../installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/concurrent/futures/_base.py:322: in __init__
    self._condition = threading.Condition()
        self       = <[AttributeError("'Future' object has no attribute '_condition'") raised in repr()] Future object at 0x5ddd8796110>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <[AttributeError("'Condition' object has no attribute '_waiters'") raised in repr()] Condition object at 0x5ddd8795790>
lock = <unlocked _thread.RLock object owner=0 count=0 at 0x5ddd881a300>

    def __init__(self, lock=None):
        if lock is None:
            lock = RLock()
        self._lock = lock
        # Export the lock's acquire() and release() methods
        self.acquire = lock.acquire
>       self.release = lock.release
E       TypeError: descriptor 'result_type' for 'numpy._core._multiarray_umath._array_converter' objects doesn't apply to a '_thread.RLock' object

Look at the failing self reprs. In both cases we're talking about attributes set in __init__ and never unset/deleted.

colesbury · 2024-07-04T19:12:09Z

I have tried to reproduce the above issue with pywt, and I'm routinely seeing errors like:

TypeError: fft() got an unexpected keyword argument 'axes'

Has the NumPy argument parsing been made thread-safe yet?

ngoldbaum · 2024-07-04T19:21:35Z

No, that PR is still in-flight: numpy/numpy#26780

ambv · 2024-07-04T19:59:47Z

I should probably mention, I'm testing all this with CFLAGS="-g0 -O3" and ./configure --enable-optimizations --with-lto --disable-gil.

ambv · 2024-07-04T20:44:37Z

Minimal repro so far

This is somewhat frustrating to reproduce without any larger dependencies or codebases. The smallest thing I managed to get to crash is this:

# repro4.py

from concurrent import futures
import os
import random

def calculate(arr):
    return sum(arr) / len(arr)

print(os.environ["PYTHONHASHSEED"], end=" ", flush=True)

for _ in range(100):
    with futures.ThreadPoolExecutor(max_workers=os.cpu_count()) as ex:
        arrs = [[random.random() for _ in range(100)] for _ in range(50)]
        results = list(ex.map(calculate, arrs))

Run it with

for seed in (seq -f%1.0f 1000000 1100000)
      PYTHONHASHSEED=$seed python repro4.py
  end

with the fish shell. On the bash shell (seq -f%1.0f 1000000 1099999 | while read s; do PYTHONHASHSEED=$s python3.14 repro4.py; done) it reproduces much less often.

The hash seed is a red herring but, believe me, if I remove that bit, I can't repro no more.

The result of running this for a longer while is a similar unexpected code execution, one example being:

  File "/Users/ambv/Documents/Python/europython/repro4.py", line 13, in <module>
    results = list(ex.map(calculate, arrs))
  File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/concurrent/futures/_base.py", line 611, in result_iterator
    yield _result_or_cancel(fs.pop())
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/concurrent/futures/_base.py", line 309, in _result_or_cancel
    return fut.result(timeout)
           ~~~~~~~~~~^^^^^^^^^
  File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/threading.py", line 526, in release
    if n < 1:
       ^^^^^
TypeError: '<' not supported between instances of 'NoneType' and 'int'

where the jump from calling concurrent.futures.Future.result(timeout) (where timeout=None) is unexpectedly being executed by threading.Semaphore.release(n=1) that doesn't expect n to ever be None. It's very weird. I managed to run pdb when this situation happens and running fut.result(None) returns the expected outcome. So this must be some momentary corruption.

colesbury · 2024-07-04T21:39:34Z

Thanks, I'm able to reproduce it intermittently now. So far only on my macOS arm64 laptop, not on x86-64 Linux. Reducing MCACHE_SIZE_EXP in pycore_typeobject.h seems to make it happen more quickly -- I changed it to 6 locally.

colesbury · 2024-07-04T23:14:07Z

The bug is in our seq lock implementation:

cpython/Python/lock.c

Lines 550 to 560 in cb688ba

    
           uint32_t _PySeqLock_EndRead(_PySeqLock *seqlock, uint32_t previous) 
        
           { 
        
               // Synchronize again and validate that the entry hasn't been updated 
        
               // while we were readying the values. 
        
                if (_Py_atomic_load_uint32_acquire(&seqlock->sequence) == previous) { 
        
                   return 1; 
        
                } 
        
                _Py_yield(); 
        
                return 0; 
        
           }

cpython/Objects/typeobject.c

Lines 5390 to 5402 in cb688ba

    
           int sequence = _PySeqLock_BeginRead(&entry->sequence); 
        
           uint32_t entry_version = _Py_atomic_load_uint32_relaxed(&entry->version); 
        
           uint32_t type_version = _Py_atomic_load_uint32_acquire(&type->tp_version_tag); 
        
           if (entry_version == type_version && 
        
               _Py_atomic_load_ptr_relaxed(&entry->name) == name) { 
        
               OBJECT_STAT_INC_COND(type_cache_hits, !is_dunder_name(name)); 
        
               OBJECT_STAT_INC_COND(type_cache_dunder_hits, is_dunder_name(name)); 
        
               PyObject *value = _Py_atomic_load_ptr_relaxed(&entry->value); 
        
               // If the sequence is still valid then we're done 
        
               if (value == NULL || _Py_TryIncref(value)) { 
        
                   if (_PySeqLock_EndRead(&entry->sequence, sequence)) { 
        
                       return value; 
        
                   }

The memory ordering on the _PySeqLock_EndRead isn't sufficient. We need an explicit fence (i.e., atomic_thread_fence). The "acquire" prevents subsequent loads from being reordered before it, but in _PySeqLock_EndRead we want to prevent earlier loads from being reordered after the _PySeqLock_EndRead().

The bug occurs on arm64, but not x86-64, because x86-64 enforces ordering of loads relative to each other.

The `_PySeqLock_EndRead` function needs an acquire fence to ensure that the load of the sequence happens after any loads within the read side critical section. The acquire on the final sequence load isn't sufficent (or necessary).

The `_PySeqLock_EndRead` function needs an acquire fence to ensure that the load of the sequence happens after any loads within the read side critical section. The missing fence can trigger bugs on macOS arm64. Additionally, we need a release fence in `_PySeqLock_LockWrite` to ensure that the sequence update is visible before any modifications to the cache entry.

…onGH-121388) The `_PySeqLock_EndRead` function needs an acquire fence to ensure that the load of the sequence happens after any loads within the read side critical section. The missing fence can trigger bugs on macOS arm64. Additionally, we need a release fence in `_PySeqLock_LockWrite` to ensure that the sequence update is visible before any modifications to the cache entry. (cherry picked from commit 1d3cf79) Co-authored-by: Sam Gross <[email protected]>

…121388) (#121505) The `_PySeqLock_EndRead` function needs an acquire fence to ensure that the load of the sequence happens after any loads within the read side critical section. The missing fence can trigger bugs on macOS arm64. Additionally, we need a release fence in `_PySeqLock_LockWrite` to ensure that the sequence update is visible before any modifications to the cache entry. (cherry picked from commit 1d3cf79) Co-authored-by: Sam Gross <[email protected]>

itamaro · 2024-07-10T03:40:54Z

With the PRs merged, is there anything left to do here?

vstinner · 2024-07-10T13:52:24Z

I close the issue.

…on#121388) The `_PySeqLock_EndRead` function needs an acquire fence to ensure that the load of the sequence happens after any loads within the read side critical section. The missing fence can trigger bugs on macOS arm64. Additionally, we need a release fence in `_PySeqLock_LockWrite` to ensure that the sequence update is visible before any modifications to the cache entry.

ambv added type-bug An unexpected behavior, bug, or error topic-free-threading labels Jul 4, 2024

ngoldbaum mentioned this issue Jul 4, 2024

test_concurrent_cwt failure on macOS arm64 with free-threaded CPython PyWavelets/pywt#758

Closed

colesbury added a commit to colesbury/cpython that referenced this issue Jul 4, 2024

pythongh-121368: Fix seq lock memory ordering in _PyType_Lookup

059646b

colesbury added a commit to colesbury/cpython that referenced this issue Jul 4, 2024

pythongh-121368: Fix seq lock memory ordering in _PyType_Lookup

5f81aae

colesbury added a commit to colesbury/cpython that referenced this issue Jul 4, 2024

pythongh-121368: Fix seq lock memory ordering in _PyType_Lookup

9a68c4c

bedevere-app bot mentioned this issue Jul 4, 2024

gh-121368: Fix seq lock memory ordering in _PyType_Lookup #121388

Merged

colesbury added a commit to colesbury/cpython that referenced this issue Jul 8, 2024

Merge branch 'main' into pythongh-121368-seq-lock

42ec9e7

bedevere-app bot mentioned this issue Jul 8, 2024

[3.13] gh-121368: Fix seq lock memory ordering in _PyType_Lookup (GH-121388) #121505

Merged

vstinner closed this as completed Jul 10, 2024

ngoldbaum mentioned this issue Jul 30, 2024

BUG: crash in test_stringdtype.py::test_ufunc_multiply under free-threaded build numpy/numpy#27063

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: descriptor 'some_method' for 'A' objects doesn't apply to a 'B' object #121368

TypeError: descriptor 'some_method' for 'A' objects doesn't apply to a 'B' object #121368

ambv commented Jul 4, 2024 •

edited by bedevere-app bot

Loading

ngoldbaum commented Jul 4, 2024

ambv commented Jul 4, 2024

colesbury commented Jul 4, 2024

ngoldbaum commented Jul 4, 2024

ambv commented Jul 4, 2024

ambv commented Jul 4, 2024 •

edited

Loading

colesbury commented Jul 4, 2024

colesbury commented Jul 4, 2024

itamaro commented Jul 10, 2024

vstinner commented Jul 10, 2024

TypeError: descriptor 'some_method' for 'A' objects doesn't apply to a 'B' object #121368

TypeError: descriptor 'some_method' for 'A' objects doesn't apply to a 'B' object #121368

Comments

ambv commented Jul 4, 2024 • edited by bedevere-app bot Loading

Bug report

Linked PRs

ngoldbaum commented Jul 4, 2024

ambv commented Jul 4, 2024

colesbury commented Jul 4, 2024

ngoldbaum commented Jul 4, 2024

ambv commented Jul 4, 2024

ambv commented Jul 4, 2024 • edited Loading

Minimal repro so far

colesbury commented Jul 4, 2024

colesbury commented Jul 4, 2024

itamaro commented Jul 10, 2024

vstinner commented Jul 10, 2024

ambv commented Jul 4, 2024 •

edited by bedevere-app bot

Loading

ambv commented Jul 4, 2024 •

edited

Loading