Python scraping: Error 54 'Connection reset by peer'

Question

I have wrote simple script to get html's from multiple website. Although I didn't have any issue with the script up until yesterday. It suddenly started throwing the exception bellow.

Traceback (most recent call last):
  File "crowling.py", line 45, in <module>
    result = requests.get(url)
  File "/Users/gen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/gen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/gen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/gen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/requests/sessions.py", line 685, in send
    r.content
  File "/Users/gen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/requests/models.py", line 829, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "/Users/gen/.pyenv/versions/3.7.1/lib/python3.7/site-packages/requests/models.py", line 754, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(54, 'Connection reset by peer')", ConnectionResetError(54, 'Connection reset by peer'))

The main part of the script is this.

c = 0
#urls is the list of urls as strings
for url in urls:
    result = requests.get(url)
    c += 1
    with open('htmls/p{}.html'.format(c),'w',encoding='UTF-8') as f:
        f.write(result.text)

The list urls is generated by my other codes and I have checked that the urls are correct. Also the timing of the exception is not constant. Sometimes it stops when scraping 20th htmls and sometimes it goes until 80th then stop. As the exception suddenly appeared without changing codes, I am guessing that the exception is due to the Internet connection. Yet, I want to ensure that the script works stably. Is there any possible cause of this error?

Probably looking at the exception stack trace those urls have unicode characters in them — bigbounty, Commented Aug 5, 2020 at 11:44

Mike67 · Accepted Answer · 2020-08-05 12:23:58Z

5

If you're sure the URLs are correct and it's an intermittent connection problem, you can just retry the connection after failure:

c = 0
#urls is the list of urls as strings
for url in urls:
    trycnt = 3  # max try cnt
    while trycnt > 0:
        try:
           result = requests.get(url)
           c += 1
           with open('htmls/p{}.html'.format(c),'w',encoding='UTF-8') as f:
               f.write(result.text)
           trycnt = 0 # success
        except ChunkedEncodingError as ex:
           if trycnt <= 0: print("Failed to retrieve: " + url + "\n" + str(ex))  # done retrying
           else: trycnt -= 1  # retry
           time.sleep(0.5)  # wait 1/2 second then retry
     # go to next URL

answered Aug 5, 2020 at 12:23

Mike67

11.3k2 gold badges8 silver badges15 bronze badges

Somehow my script now began to work properly and on other linux server but the idea of retrying with except clause is incredible. Thank you so much for your idea.
– TFC
Commented Aug 5, 2020 at 17:33
1

Please accept an answer so this post is removed from the "No Answer" list. Thanks.
– Mike67
Commented Sep 1, 2020 at 19:11

Add a comment |

Collectives™ on Stack Overflow

Python scraping: Error 54 'Connection reset by peer'

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
web-scraping
python-requests
urllib3
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonweb-scrapingpython-requestsurllib3 or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
web-scraping
python-requests
urllib3
or ask your own question.