I am calling an API and it returns JSON string. I want to convert it to CSV-format so I can save it later to database. However, JSON objects keys cause problems because there is keys missing or keys are changing. I wrote this python script but because of keys I cannot get it to work:
import json
import csv
with open('custom.json') as json_file:
data = json.load(json_file)
custom_data = data['CustomJSON']
data_file = open('data_file.csv', 'w')
csv_writer = csv.writer(data_file)
count = 0
for i in custom_data:
if count == 0:
# Writing headers of CSV file
header = i.keys()
csv_writer.writerow(header)
count = 1
# Writing data of CSV file
csv_writer.writerow(i.values())
data_file.close()
How I can convert this type of JSON to CSV? Example JSON message:
{
"CustomJSON" : [
{
"id" : "1,
"name" : "Jack",
"surname" : "Bauer"
},
{
"id" : "2",
"name" : "John",
"surname" : "Smith"
"age" : "31",
"city" : "New York"
},
{
"id" : "3",
"name" : "Matt",
"surname" : "Secret"
"exception_1" : "Exception_1",
"exception_2" : "Exception_2"
"date" : "2022-02-08"
}
]
}
Should I try to loop all key-values first somehow and then later try to add data? Can anyone provide an example?
CodePudding user response:
As you are reading a single JSON string, you will have everything in memory. So IMHO the simplest way is to first build the field names list, and then write everything to a csv file.
# compute the fieldnamelist
# this uses a dict because it is easy to update it while maintaining key order
keys = dict()
for d in data['CustomJSON']:
keys.update(d)
# write to the csv file
# this uses a DictWriter because the individual rows are already dicts
with open('data_file.csv', 'w', newline='') as data_file
csv_writer = csv.DictWriter(data_file, fieldnames = keys.keys())
_ = csv_writer.writeheader()
_ = csv_writer.writerows(data['CustomJSON'])
With your data it gives as expected:
id,name,surname,age,city,exception_1,exception_2,date
1,Jack,Bauer,,,,,
2,John,Smith,31,New York,,,
3,Matt,Secret,,,Exception_1,Exception_2,2022-02-08
CodePudding user response:
I am a pandas fan(atic) so I'd do something like
import pandas as pd
# df is a pandas dataframe)
df = pd.read_json('http://data.com/foo')
df.to_csv('foo.csv')
Pandas has options for the CSV dialect, if need be. You should be able to do what you describe with those two function calls, though.
