"assignees": [{ "id": 1234, "username": "xyz", "name": "XYZ", "state": "active", "avatar_url": "aaaaaaaaaaaaaaa", "web_url": "bbbbbbbbbbb" }, { "id": 5678, "username": "abcd", "name": "ABCD", "state": "active", "avatar_url": "hhhhhhhhhhh", "web_url": "mmmmmmmmm" } ],
I have this column in my json file. This columns repeats several times, containing different values. I want to split this column into separate columns like: id username name state url
basically like a excel columns with respective values. But whenever I am trying to use the function: df.explode('assignees') It is showing me the error: ValueError: columns must be unique
Similarly I have a column "labels": [ "Scanning", "Scanning at Scale", "Workflow" ], Same error is showing for this. Labels is also occurring lot of times with different values in my json file. I want the labels to be in row format, repeating. Like: Scanning Scanning at scale Workflow Scanning Scanning at Scale Workflow Priority Firewall In this manner I want. What should I do?
CodePudding user response:
If I understand you correctly you have a json/dict that looks like this:
json = {"assignees": [{ "id": 1234, "username": "xyz", "name": "XYZ", "state": "active", "avatar_url": "aaaaaaaaaaaaaaa", "web_url": "bbbbbbbbbbb" },
{ "id": 5678, "username": "abcd", "name": "ABCD", "state": "active", "avatar_url": "hhhhhhhhhhh", "web_url": "mmmmmmmmm" }]}
If that is the case you could just use pd.DataFrame(json.get('assignees')) to create a DataFrame of this format (which is what you want as far as I understand):
| id | username | name | state | avatar_url | web_url | |
|---|---|---|---|---|---|---|
| 0 | 1234 | xyz | XYZ | active | aaaaaaaaaaaaaaa | bbbbbbbbbbb |
| 1 | 5678 | abcd | ABCD | active | hhhhhhhhhhh | mmmmmmmmm |
CodePudding user response:
Best solution with separate and order data is using pandas library, bellow you can get your data in DataFrame type, and you can also rearrange it to save it as JSON again:
import pandas as pd
jtry={"assignees": [{ "id": 1234, "username": "xyz", "name": "XYZ", "state":
"active", "avatar_url": "aaaaaaaaaaaaaaa", "web_url": "bbbbbbbbbbb" }, { "id":
5678, "username": "abcd", "name": "ABCD", "state": "active", "avatar_url":
"hhhhhhhhhhh", "web_url": "mmmmmmmmm" }]}
jtry['assignees'][0]
df_jtry=pd.DataFrame(jtry['assignees'])
print(df_jtry)
and remake as dict() to load it as JSON again:
new_dic={'id':list(df_jtry['id'].values),'username':list(df_jtry['username'].values),'name':list(df_jtry['name'].values),'state':list(df_jtry['state'].values),'avatar_url':list(df_jtry['avatar_url'].values),'web_url':list(df_jtry['web_url'].values)}
