Extract Dataframe from multiple JSONs in a file-CodePudding

I have a dataset in a example.txt file containing about 80k rows. Its format is like this

{..., "text": ..., "class": ..., ...}
{..., "text": ..., "class": ..., ...}
{..., "text": ..., "class": ..., ...}

The JSONs do not have any commas between them. What I want to do is to extract the text and class columns to a Pandas DataFrame to look like this:

Text	Class
Text1	Class1
Text2	Class2

How can I do this?

Many thanks in advance!

CodePudding user response：

Just use pd.read_json with lines=True (as this format of JSON is called, not surprisingly, JSON Lines):

df = pd.read_json('path/to/your/file.json', lines=True)[['text', 'class']]