Home > OS >  How can I convert JSON format text to dataframe?
How can I convert JSON format text to dataframe?

Time:01-13

I am trying to convert below JSON format text to pandas or spark data frame, but it is giving below error.

ERROR: JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

Python CODE:

# import pandas to read json file
import json
path = "sample.json"
with open(path, 'r') as myfile:
    data=myfile.read()
    data = data.replace('\t','')
    data = data.replace('\n','')
    data = data.replace(',}','}')
    data = data.replace(',]',']')
obj = json.loads(data)

JSON file format

enter image description here

Output of data after reading .json file by using open function

enter image description here

How can I convert above text as a data frame?

CodePudding user response:

I got, I added few lines of code

path = "sample.json"
with open(path, 'r') as myfile:
    data=myfile.read()
    data = data.replace('\t','')
    data = data.replace('\n','')
    data = data.replace(',}','}')
    data = data.replace(',]',']')
data = data.replace("null", "''")
liss = []
data1 = data[1:-1]
data2 = data1.split("},")
for i in data2:
  last_value = i[len(i)-1]
  if last_value != "}":
    new_text = i "}"
    liss.append(new_text)
  else:
    new_text = i
    liss.append(new_text)
sample_df = pd.DataFrame({"Col1":liss})

sample_df["Col1"] = sample_df["Col1"].apply(lambda x : dict(eval(x)) )
df3 = sample_df["Col1"].apply(pd.Series )
df3

CodePudding user response:

I think you can read the json and save it in a dictionary. Once you have this dictionary you can create a spark dataframe with the following line of code

df = spark.createDataFrame(dict)
  •  Tags:  
  • Related