Skip to main content

All Questions

Tagged with
Filter by
Sorted by
Tagged with
1 vote
3 answers
3k views

Extract entire textual data from Edgar 10-K using python

I am trying to extract entire textual data from the given URL below as an example. I have many URLs so automating. I tried every code posted here - they are giving error, eg AttributeError: 'NoneType' ...
silicon23's user avatar
0 votes
2 answers
1k views

Extract business description (Item 1) of multiple firms from their 10-K reports

I am trying to extract business descriptions of multiple firms from their 10-K reports using the R package, edgar. I am using getBusinDescr function to do so. However, I am only able to extract Item 1 ...
Felix's user avatar
  • 1
3 votes
1 answer
2k views

Parse XML with Python lxml

I am trying to parse a XML using the python library lxml, and would like the resulting output to be in a dataframe. I am relatively new to python and parsing so please bear with me as I outline the ...
stump's user avatar
  • 85
2 votes
1 answer
2k views

Extracting table of holdings from (Edgar 13-F filings) TXT (pre-2013) with python

I am working on extracting a table of holdings from 13-F form on EDGAR. Before 2013 holdings were given in a txt file (see example). The output I am aiming for is a pd.DataFrame with same shape as the ...
NoobFin's user avatar
  • 23
0 votes
1 answer
720 views

Saving SEC 10-K annual report text to files (trouble with decoding)

I am trying to bulk-download the text visible to the "end-user" from 10-K SEC Edgar reports (don't care about tables) and save it in a text file. I have found the code below on Youtube, however I am ...
dernuco's user avatar
  • 15
1 vote
0 answers
679 views

Count keywords in SEC Edgar 10-K filings text-body with Python

I am trying to parse the text section of the SEC Edgar texts in Python 3, e.g.: https://www.sec.gov/Archives/edgar/data/796343/0000796343-14-000004.txt My goal is to collect the number of occurrences ...
dernuco's user avatar
  • 15
5 votes
1 answer
441 views

SEC company filings: Is the <SEC-HEADER> tag valid SGML? If so, how to parse it?

I tried to parse SEC company filings from sec.gov. Starting from fb 10-Q index.htm let's look at a complete text submission filing like complete submission text filing. It has a structure like: <...
Michael S's user avatar
  • 476
3 votes
0 answers
734 views

How would I approach a lot of structured-but-inconsistent data? [closed]

I'm attempting to parse EDGAR documents - they're SEC filings. Specifically, I'm attempting to parse both SEC Schedule 13D and Schedule 13G filings. There appears to be lots of failed attempts at ...
Mr_Spock's user avatar
  • 3,835