0

For reading a HTML-tree, I have a script that reads the element-tree. However, the problem I have is that with creating a list, the script uses for a missing value, the last known value.

However, using record_dict.clear() after the first layer of 'records in journal', the script takes 'NaN' for a missing value.

When I want to do this with the total script, it doens't work. This is logical, because it is using the values in different layers of data.

To make everything clear, I have the following scripts and examples:

The total script is:

transactions_df = pd.DataFrame()
total_recordstotal = list()
record_dict = dict()

for journal in journals:
    #record_dict = pd.DataFrame([dict()])    
    for records in journal:
        
        if len(records) == 0:
            columnnames = records.tag.replace(ns,'')
            columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
            
            #columnvalues = [x for x in records.text if x is not None]
            columnvalues = records.text
      
            record_dict[columnnames] = columnvalues
            
            
        
        else :
            for record in records: 
                if len(record) == 0:
                    columnnames = record.tag.replace(ns,'')
                    columnnames = record.tag.replace('nr','boeknr')
                    columnnames = columnnames.replace(ns,'')
                    columnvalues = record.text
                    record_dict[columnnames] = columnvalues 
  
                else:

                    for subfields in record: 
                        if len(subfields) == 0:
                            columnnames = subfields.tag.replace(ns,'')
                            columnvalues = subfields.text
                            record_dict[columnnames] = columnvalues

                        else: 

                            for subfields_1 in subfields: .
                                if len(subfields_1) == 0:
                                    columnnames = subfields_1.tag.replace(ns,'')
                                    columnvalues = subfields_1.text
                                    record_dict[columnnames] = columnvalues
                                else : print('nog een sublaag!')

                    total_recordstotal.append(record_dict.copy()) 

If I use this script for only the first layer of data, I get the following, where every jrnID should have it's own identical bankAccNr:

for journal in journals:  
    for records in journal:    
        if len(records) == 0:
            columnnames = records.tag.replace(ns,'')
            columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
            columnvalues = records.text
            record_dict[columnnames] = columnvalues
            
    
    total_records1.append((record_dict.copy())) 
jrnID bankAccNr
1 NaN
2 12
3 12
4 12
5 22
6 22
7 33
8 33
9 33
10 33

The output seems right, when changing the code to:

for journal in journals:  
    for records in journal:    
        if len(records) == 0:
            columnnames = records.tag.replace(ns,'')
            columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
            columnvalues = records.text
            record_dict[columnnames] = columnvalues
            
    
    total_records1.append((record_dict.copy())) 
    record_dict.clear()
jrnID bankAccNr
1 NaN
2 12
3 NaN
4 NaN
5 22
6 NaN
7 33
8 NaN
9 NaN
10 NaN

However, is this only one layer of the tree and I have to read more layers of the tree to append together. My question is therefore: how can I append the upper solution to the whole script, shown above?

If I use this script for only the first layer of data, I get the following, where every jrnID should have it's own identical bankAccNr:

for journal in journals:  
    for records in journal:    
        if len(records) == 0:
            columnnames = records.tag.replace(ns,'')
            columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
            columnvalues = records.text
            record_dict[columnnames] = columnvalues
            
    
    total_records1.append((record_dict.copy())) 
jrnID bankAccNr
1 NaN
2 12
3 12
4 12
5 22
6 22
7 33
8 33
9 33
10 33

The output seems right, when changing the code to:

for journal in journals:  
    for records in journal:    
        if len(records) == 0:
            columnnames = records.tag.replace(ns,'')
            columnnames = records.tag.replace(ns + 'desc', 'journal desc').replace(ns + 'jrnTp', 'jrnTp').replace(ns + 'jrnID', 'jrnID').replace(ns + 'offsetAccID', 'offsetAccID').replace(ns + 'bankAccNr', 'bankAccNr')
            columnvalues = records.text
            record_dict[columnnames] = columnvalues
            
    
    total_records1.append((record_dict.copy())) 
    record_dict.clear()
jrnID bankAccNr
1 NaN
2 12
3 NaN
4 NaN
5 22
6 NaN
7 33
8 NaN
9 NaN
10 NaN

However, is this only one layer of the tree and I have to read more layers of the tree to append together. My question is therefore: how can I append the upper solution to the whole script, shown above?

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.