I have a dataset in a example.txt file containing about 80k rows. Its format is like this
{..., "text": ..., "class": ..., ...}
{..., "text": ..., "class": ..., ...}
{..., "text": ..., "class": ..., ...}
The JSONs do not have any commas between them. What I want to do is to extract the text and class columns to a Pandas DataFrame to look like this:
| Text | Class |
|---|---|
| Text1 | Class1 |
| Text2 | Class2 |
How can I do this?
Many thanks in advance!
CodePudding user response:
Just use pd.read_json with lines=True (as this format of JSON is called, not surprisingly, JSON Lines):
df = pd.read_json('path/to/your/file.json', lines=True)[['text', 'class']]
