BeautifulSoup: Get the HTML Code of Modal Footer

Question

I'm new to Web scraping in Python and try to scrape all htm document-links from an SEC Edgar full-text search. I can see the link in the Modal Footer, but BeautifulSoup won't parse the href Element with the link.

Is there an easy solution to parse the links of the documents?

Snapshot of the Link in the HTML Code on the Website

url = 'https://www.sec.gov/edgar/search/#/q=ex10&category=custom&forms=10-K%252C10-Q%252C8-K'
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
    
for a in soup.find_all(id = "open-file"):
    print(a)

Jack Fleeting · Accepted Answer · 2021-10-08 14:47:17Z

That data is loaded dynamically using javascript. There is a lot of information about scraping this kind of page (see one of many examples here); in this case, the following should get you there:

import requests
import json
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0',
    'Accept': 'application/json, text/javascript, */*; q=0.01',   
}

data = '{"q":"ex10","category":"custom","forms":["10-K","10-Q","8-K"],"startdt":"2020-10-08","enddt":"2021-10-08"}'
#obvioulsy, you need to change "startdt" and "enddt" as necessary
response = requests.post('https://efts.sec.gov/LATEST/search-index', headers=headers, data=data)

The response is in json format. Your urls are hidden in there:

data = json.loads(response.text)
hits = data['hits']['hits']
for hit in hits:
    cik = hit['_source']['ciks'][0]
    file_data = hit['_id'].split(":")
    filing = file_data[0].replace('-','')
    file_name = file_data[1]
    url = f'https://www.sec.gov/Archives/edgar/data/{cik}/{filing}/{file_name}'
    print(url)

Output:

https://www.sec.gov/Archives/edgar/data/0001372183/000158069520000415/ex10-5.htm
https://www.sec.gov/Archives/edgar/data/0001372183/000138713120009670/ex10-5.htm
https://www.sec.gov/Archives/edgar/data/0001540615/000154061520000006/ex10.htm
https://www.sec.gov/Archives/edgar/data/0001552189/000165495421004948/ex10-1.htm

etc.

Collectives™ on Stack Overflow

BeautifulSoup: Get the HTML Code of Modal Footer

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
html-parsing
edgar
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonhtml-parsingedgar or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
html-parsing
edgar
or ask your own question.