For reading a HTML-tree, I have a script that reads the element-tree. However, the problem I have is that with creating a list, the script uses for a missing value, the last known value.
However, using record_dict.clear() after the first layer of 'records in journal', the script takes 'NaN' for a missing value.
When I want to do this with the total script, it doens't work. This is logical, because it is using the values in different layers of data.
To make everything clear, I have the following scripts and examples:
The total script is:
transactions_df = pd.DataFrame()
total_recordstotal = list()
record_dict = dict()
for journal in journals:
#record_dict = pd.DataFrame([dict()])
for records in journal:
if len(records) == 0:
columnnames = records.tag.replace(ns,'')
columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
#columnvalues = [x for x in records.text if x is not None]
columnvalues = records.text
record_dict[columnnames] = columnvalues
else :
for record in records:
if len(record) == 0:
columnnames = record.tag.replace(ns,'')
columnnames = record.tag.replace('nr','boeknr')
columnnames = columnnames.replace(ns,'')
columnvalues = record.text
record_dict[columnnames] = columnvalues
else:
for subfields in record:
if len(subfields) == 0:
columnnames = subfields.tag.replace(ns,'')
columnvalues = subfields.text
record_dict[columnnames] = columnvalues
else:
for subfields_1 in subfields: .
if len(subfields_1) == 0:
columnnames = subfields_1.tag.replace(ns,'')
columnvalues = subfields_1.text
record_dict[columnnames] = columnvalues
else : print('nog een sublaag!')
total_recordstotal.append(record_dict.copy())
If I use this script for only the first layer of data, I get the following, where every jrnID should have it's own identical bankAccNr:
for journal in journals:
for records in journal:
if len(records) == 0:
columnnames = records.tag.replace(ns,'')
columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
columnvalues = records.text
record_dict[columnnames] = columnvalues
total_records1.append((record_dict.copy()))
jrnID | bankAccNr |
---|---|
1 | NaN |
2 | 12 |
3 | 12 |
4 | 12 |
5 | 22 |
6 | 22 |
7 | 33 |
8 | 33 |
9 | 33 |
10 | 33 |
The output seems right, when changing the code to:
for journal in journals:
for records in journal:
if len(records) == 0:
columnnames = records.tag.replace(ns,'')
columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
columnvalues = records.text
record_dict[columnnames] = columnvalues
total_records1.append((record_dict.copy()))
record_dict.clear()
jrnID | bankAccNr |
---|---|
1 | NaN |
2 | 12 |
3 | NaN |
4 | NaN |
5 | 22 |
6 | NaN |
7 | 33 |
8 | NaN |
9 | NaN |
10 | NaN |
However, is this only one layer of the tree and I have to read more layers of the tree to append together. My question is therefore: how can I append the upper solution to the whole script, shown above?
If I use this script for only the first layer of data, I get the following, where every jrnID should have it's own identical bankAccNr:
for journal in journals:
for records in journal:
if len(records) == 0:
columnnames = records.tag.replace(ns,'')
columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
columnvalues = records.text
record_dict[columnnames] = columnvalues
total_records1.append((record_dict.copy()))
jrnID | bankAccNr |
---|---|
1 | NaN |
2 | 12 |
3 | 12 |
4 | 12 |
5 | 22 |
6 | 22 |
7 | 33 |
8 | 33 |
9 | 33 |
10 | 33 |
The output seems right, when changing the code to:
for journal in journals:
for records in journal:
if len(records) == 0:
columnnames = records.tag.replace(ns,'')
columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
columnvalues = records.text
record_dict[columnnames] = columnvalues
total_records1.append((record_dict.copy()))
record_dict.clear()
jrnID | bankAccNr |
---|---|
1 | NaN |
2 | 12 |
3 | NaN |
4 | NaN |
5 | 22 |
6 | NaN |
7 | 33 |
8 | NaN |
9 | NaN |
10 | NaN |
However, is this only one layer of the tree and I have to read more layers of the tree to append together. My question is therefore: how can I append the upper solution to the whole script, shown above?