I have the below input.csv file and I'm having trouble in converting it to a .json file.
Below is the input.csv file that I have which I want to convert it into .json file. The Text field is in Sinhala Language
Date,Text,Category
2021-07-28,"['ලංකාව', 'ලංකාව']",Sports
2021-07-28,"['ඊයේ', 'ඊයේ']",Sports
2021-07-29,"['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']",Sports
2021-07-29,"['ඊයේ', 'ඊයේ', 'ඊයේ', 'ඊයේ']",Sports
2021-08-01,"['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']",Sports
The .json format that I want to have is as of below
[
{
"category":"Sports",
"date":"2021-07-28",
"data": ['ලංකාව', 'ලංකාව']
},
{
"category":"Sports",
"date":"2021-07-28",
"data": ['ඊයේ', 'ඊයේ']
},
{
"category":"Sports",
"date":"2021-07-29",
"data": ['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']
},
{
"category":"Sports",
"date":"2021-07-29",
"data": ['ඊයේ', 'ඊයේ', 'ඊයේ', 'ඊයේ']
},
{
"category":"Sports",
"date":"2021-08-01",
"data": ['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']
}
]
Below is how I tried, since this is of Sinhala Language, values are show in this format \u0d8a\u0dba\u0dda, which is another thing that I'm struggling to sort out. And the json format is also wrong that I expect it to be.
import csv
import json
def toJson():
csvfile = open('outputS.csv', 'r', encoding='utf-8')
jsonfile = open('file.json', 'w')
fieldnames = ("date", "text", "category")
reader = csv.DictReader(csvfile, fieldnames)
out = json.dumps([row for row in reader])
jsonfile.write(out)
if __name__ == '__main__':
toJson()
CodePudding user response:
Use ensure_ascii=False when doing json.dumps:
out = json.dumps([row for row in reader], ensure_ascii=False)
Other notes:
- Since the first row of the csv contains the column names, you should either skip this first row, or let
csv.DictReaderuse the first row as the column names automatically by not passing explicit values tofieldnames. - It's very bad practice to use
openand then not close it. To make things easier you can use awithstatement. - The second column of the csv file will be treated as a string and not as a list of strings unless you specifically parse it as such (you can use
literal_evalfrom theastmodule for this). - You can use
json.dumpinstead ofjson.dumpsto write directly to the file.
With this, you can rewrite your function to:
def toJson():
with (open('delete.csv', 'r', encoding='utf-8') as csvfile,
open('file.json', 'w') as jsonfile):
fieldnames = ("date", "text", "category")
reader = csv.DictReader(csvfile, fieldnames)
next(reader) # skip header row
json.dump([row for row in reader], jsonfile, ensure_ascii=False)
CodePudding user response:
Read your CSV using pandas
# using pd.read_csv()use to_dict function with orient option set to records
df = pd.read_csv('your_csv_file_name.csv')df.to_dict(orient='records')
