How should I scrape an idx file on EDGAR?

Question

I have an idx file: https://www.sec.gov/Archives/edgar/daily-index/2020/QTR4/master.20201231.idx

I could open the idx file with following codes one year ago, but the codes don't work now. Why is that? How should I modify the code?

import requests
import urllib
from bs4 import BeautifulSoup

master_data = []
file_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/QTR4/master.20201231.idx"
byte_data = requests.get(file_url).content
data_format = byte_data.decode('utf-8').split('------')
content = data_format[-1]
data_list = content.replace('\n','|').split('|')

    for index, item in enumerate(data_list):

        if '.txt' in item:
            if data_list[index - 2] == '10-K':
                entry_list = data_list[index - 4: index + 1]
                entry_list[4] = "https://www.sec.gov/Archives/" + entry_list[4]
                master_data.append(entry_list)

print(master_data)

S P Sharan · Accepted Answer · 2022-01-12 05:24:12Z

If you had inspected the contents of the byte_data variable, you would find that it does not have the actual content of the idx file. It is basically present to prevent scraping bots like yours. You can find more information in this answer: Problem HTTP error 403 in Python 3 Web Scraping

In this case, your answer would be to just use the User-Agent in the header for the request.

import requests

master_data = []
file_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/QTR4/master.20201231.idx"
byte_data = requests.get(file_url, allow_redirects=True, headers={"User-Agent": "XYZ/3.0"}).content

# Your further processing here

On a side note, your processing does not print anything as the if condition is never met for any of the lines, so do not think this solution does not work.

Collectives™ on Stack Overflow

How should I scrape an idx file on EDGAR?

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
edgar
sec
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonedgarsec or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
edgar
sec
or ask your own question.