0

I am looping through a json line files where i am just filtering for sender id and status nd outputting this to the terminal. There are multiple_sender id which are within a list whilst the sender is are just a string. I want to be able to write the output on one csv file where the first column is STATUS and the second one is SENDER_ID. I have attempted this at the top of my script but not sure if this is the right way of doing so.

My script is as follows. At which point would i need to write it to csv.I have read through the documentation but still a little unsure.

import json_lines

text_file = open("senderv1.csv", "a")
with open('specifications.jsonl', 'rb') as f:
for item in json_lines.reader(f):
10
  • 1
    please add some values from specifications.json for more clarification
    – Shan Ali
    Commented Aug 28, 2019 at 12:50
  • Wouldn't you want to append all values to a dict/list and convert it to a dataframe, to then export it to a .csv file? Commented Aug 28, 2019 at 12:56
  • Could you point to an example @CeliusStingher Commented Aug 28, 2019 at 13:19
  • Sai kumar got ahead of me and posted the answer, here he is not opening the file, but creating a new. Do you want to open the existing file and be able to edit it, or you do want to create a new .csv file? Commented Aug 28, 2019 at 13:21
  • 1
    What's your question exactly ? Is your code working ? If yes and you're just looking for improvments, SO is not the right place, you want codereview instead. Else, please explain clearly the issue you're having (cf stackoverflow.com/help/how-to-ask) Commented Aug 28, 2019 at 13:32

2 Answers 2

0

Using pandas you can create the dataframe and thereby save it as csv. Hope this will solve your problem.

import json_lines 
import pandas as pd 
# text_file = open("senderv1.csv", "a") 

single_sender_status=[] 
single_sender=[] 
with open('specifications.jsonl', 'rb') as f: 
    for item in json_lines.reader(f): 
        if 'sender_id' in item: 
            single_sender_status.append(item['status']) 
            single_sender.append(item['sender_id']) 
            # text_file.write(single_sender_status) 
            # text_file.write('\t') 
            # text_file.write(single_sender) 
            # text_file.write('\n') 
            # print("Single ID " + str(single_sender)) 
        else: 
            single_sender_status.append(item['status']) 
            single_sender.append([sender['id'] for sender in item['senders']]) 
            # text_file.write(single_sender_status) 
            # text_file.write('\t') 
            # text_file.write(multiple_sender_ids) 
        # print("Multiple Sender ID'S " + str(multiple_sender_ids)) 

df=pd.DataFrame({'STATUS':single_sender_status,'SENDER_ID':single_sender}) 

df.to_csv('senderv1.csv',index=False)
1
  • 3
    You don't need such a huge and complex dependency as panda just to write a basic simple CSV file - there's already a csv module in the stdlib and it's just as easy to use. Commented Aug 28, 2019 at 13:30
0

Here is code to write a CSV file with the csv module from the standard library. If the first column contains the status and the following columns the senders:

#!/usr/bin/env python3
import csv

import json_lines


def main():
    with json_lines.open("specifications.jsonl") as reader:
        with open("senderv1.csv", "w", encoding="utf8") as csv_file:
            writer = csv.writer(csv_file, delimiter="\t")
            for item in reader:
                row = [item["status"]]
                if "sender_id" in item:
                    row.append(item["sender_id"])
                elif "senders" in item:
                    row.extend(sender["id"] for sender in item["senders"])
                else:
                    raise ValueError("item with no sender information")
                writer.writerow(row)


if __name__ == "__main__":
    main()

To have the same information spread across different columns isn't really good, but putting more than one value into a single cell isn't good either. CSV is best suited for two dimensional tabular data. Maybe you want JSON (Lines) for the result too‽

2
  • Your script has only output one row of data and is not looping entire the whole jsonl file? Commented Aug 28, 2019 at 14:27
  • Sorry, the writerow() call wasn't in the loop. It's fixed now.
    – BlackJack
    Commented Aug 28, 2019 at 14:34

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.