python jdon to csv ，Works with all types of codes，-CodePudding

I often need to convert JSON files to csv files. There are many types of JSON files, regular or nested, so is there any code that can convert all types to csv files? I often need to convert JSON files to csv files. There are many types of JSON files, regular or nested, so is there any code that can convert all types to csv files?

Here's my code, but it can only convert small files, and for files in the tens or tens of GBs, it can get a "memory error". Is there a more efficient code, or modify my code to read line by line or in chunks to reduce memory usage.

import json
import pandas as pd

file_path = "F:\\1.json"
with open(file_path, 'r',encoding='utf-8') as fh:
    file_data = fh.readlines()

all_data = []
for data in file_data:
    data = data.strip()
    if data:
        all_data.append(json.loads(data))
df = pd.json_normalize(all_data)

df.to_csv('F:\\1.csv',encoding='utf-8',index=False)

CodePudding user response：

No, there is no generic way to convert an arbitrarily nested JSON file to CSV – at least not into a CSV that would make sense or be generically usable.

CodePudding user response：

To avoid the memory error you need to avoid reading the whole file into memory at once. This can be done by processing your JSON entries one line at a time. This assumes it is the same format as your other question: python json to csv， How to read a file in chunks or line by line - i.e. a complete JSON record per line.

Speed can probably be improved by reading and processing a group of lines before writing the output. Here it keeps processing until a threshold is reached:

import pandas as pd

add_header = True
df = pd.DataFrame()

with open('1.json') as f_json:
    for line in f_json:
        line = line.strip()
        
        if line:
            df_line = pd.read_json(line)
            df = pd.concat([df, df_line])
            
            if df.size > 10000:
                df.to_csv('1.csv', index=None, mode='a', header=add_header)
                add_header = False
                df = df.DataFrame()

if df.size:
    df.to_csv('1.csv', index=None, mode='a', header=add_header)

The bigger the threshold the better but the more memory will be required.