Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race condition on multiprocessing.Manager().dict() on macOS #87934

Closed
jerryc05 mannequin opened this issue Apr 7, 2021 · 5 comments
Closed

Possible race condition on multiprocessing.Manager().dict() on macOS #87934

jerryc05 mannequin opened this issue Apr 7, 2021 · 5 comments
Labels
3.9 only security fixes OS-mac topic-multiprocessing type-bug An unexpected behavior, bug, or error

Comments

@jerryc05
Copy link
Mannequin

jerryc05 mannequin commented Apr 7, 2021

BPO 43768
Nosy @ronaldoussoren, @pitrou, @ned-deily, @applio

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-04-07.19:12:33.180>
labels = ['OS-mac', 'type-bug', '3.9']
title = 'Possible race condition on multiprocessing.Manager().dict() on macOS'
updated_at = <Date 2021-04-17.00:45:40.905>
user = 'https://bugs.python.org/jerryc05'

bugs.python.org fields:

activity = <Date 2021-04-17.00:45:40.905>
actor = 'ned.deily'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['macOS']
creation = <Date 2021-04-07.19:12:33.180>
creator = 'jerryc05'
dependencies = []
files = []
hgrepos = []
issue_num = 43768
keywords = []
message_count = 1.0
messages = ['390468']
nosy_count = 5.0
nosy_names = ['ronaldoussoren', 'pitrou', 'ned.deily', 'davin', 'jerryc05']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue43768'
versions = ['Python 3.9']

@jerryc05
Copy link
Mannequin Author

jerryc05 mannequin commented Apr 7, 2021

I am not sure if this is a bug or an expected case.

Long story short, I tried to print the content of a multiprocessing.Manager().dict() in the main thread, but I got a strange error.

I encountered this error only when the number of pools is rather large (>20) and only on macOS (works perfectly on Linux).

Specs:

  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
macOS: 11.2.3

The minimum err code is attached:

#!/usr/bin/env python3

from contextlib import suppress
import multiprocessing as mp
import time


def run():
    D[mp.current_process().name] = 'some val'
    time.sleep(0.5)


if __name__ == '__main__':
    mp.set_start_method('fork')
    D, rets = mp.Manager().dict(), []
    with mp.Pool(25) as p:
        for _ in range(33):
            rets.append(p.apply_async(run))
        while rets:
            for ret in rets[:]:
                with suppress(mp.TimeoutError):
                    ret.get(timeout=0)
                    rets.remove(ret)
                    print(len(D))

Error:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/local/Cellar/[email protected]/3.9.2_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/managers.py", line 801, in _callmethod
    conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Cellar/[email protected]/3.9.2_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/???", line 9, in run
    D[mp.current_process().name] = 'some val'
  File "<string>", line 2, in __setitem__
  File "/usr/local/Cellar/[email protected]/3.9.2_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/managers.py", line 805, in _callmethod
    self._connect()
  File "/usr/local/Cellar/[email protected]/3.9.2_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/managers.py", line 792, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
  File "/usr/local/Cellar/[email protected]/3.9.2_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/connection.py", line 507, in Client
    c = SocketClient(address)
  File "/usr/local/Cellar/[email protected]/3.9.2_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/connection.py", line 635, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 61] Connection refused
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/???", line 22, in <module>
    ret.get(timeout=0)
  File "/usr/local/Cellar/[email protected]/3.9.2_4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
ConnectionRefusedError: [Errno 61] Connection refused

@jerryc05 jerryc05 mannequin added 3.9 only security fixes OS-mac type-bug An unexpected behavior, bug, or error labels Apr 7, 2021
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@ronaldoussoren
Copy link
Contributor

I tested the script on my machine (macOS 13.0.1, python 3.9, 3.10 and 3.11 all installed using the python.org installer), and the error occurs intermittently, likewise with a fresh build of 3.12. Disabling the local firewall does not avoid this problem.

This appears to be a timing problem, the main process is not yet listening to the socket when the child tries to connect.

Below is a crude hack that implements a retry loop and appears to fix the issue for me. Added as an inline patch instead of a PR because I'm far from convinced that this is a correct fix. I've barely used multiprocessing myself and know to little about its design to know what the correct place would be to implement a retry loop.

diff --git a/Lib/multiprocessing/connection.py b/Lib/multiprocessing/connection.py
index b08144f7a1..7954fefd62 100644
--- a/Lib/multiprocessing/connection.py
+++ b/Lib/multiprocessing/connection.py
@@ -625,9 +625,13 @@ def SocketClient(address):
     '''
     family = address_type(address)
     with socket.socket( getattr(socket, family) ) as s:
-        s.setblocking(True)
-        s.connect(address)
-        return Connection(s.detach())
+        for _ in range(3):
+            try:
+                s.setblocking(True)
+                s.connect(address)
+                return Connection(s.detach())
+            except socket.error:
+                time.sleep(0.5)

@ronaldoussoren
Copy link
Contributor

@applio and/or @gpshead, what would be the correct place to implement a retry loop as sketched in my previous message? Or is retrying not the right solution here?

@ronaldoussoren
Copy link
Contributor

This is the same problem as #101225, but with a different limit in the backlog.

@ronaldoussoren
Copy link
Contributor

The race condition doesn't happen for me with the fix for #101225. That's technically just reducing the size of window where the race condition can happen, but should be fine given that I've increased the backlog far beyond what's needed to avoid hitting the race (famous last words...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes OS-mac topic-multiprocessing type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

2 participants