I have a set of data like:
values site_name timezone variable_name
0 [{'value': SAN JOAQUIN PST degC
[{'value': '9.3',
'qualifiers': ['P'],
'date': '2022-01-05'
},
{'value': '9.4',
'qualifiers': ['P'],
'date': '2022-01-05'
}]
}]
1 [{'value': SAN JOAQUIN PST pH
[{'value': '7.5',
'qualifiers': ['P'],
'date': '2022-01-05'
},
{'value': '7.8',
'qualifiers': ['P'],
'date': '2022-01-05'
}]
}]
where the values are a long list and each row has these nested sets of values. How can I use pandas to convert each of the variable_name's to their own dataframe?
I'd like to have something like:
degC table
value date qualifier
0 9.3 2022-01-05 P
1 9.4 2022-01-05 P
pH table
value date qualifier
0 7.5 2022-01-05 P
1 7.8 2022-01-05 P
Here's what I have tried so far:
df = pd.json_normalize(file)
for i in range(len(df.index)):
pd.json_normalize(df.iloc[i])
Raw input which is shown above:
df = pd.DataFrame({'values':[[{'value': [{'value': '9.3', 'qualifiers': ['P'], 'date': '2022-01-05'},
{'value': '9.4', 'qualifiers': ['P'], 'date': '2022-01-05'}]
}],
[{'value': [{'value': '7.5', 'qualifiers': ['P'], 'date': '2022-01-05'},
{'value': '7.8', 'qualifiers': ['P'], 'date': '2022-01-05'}]
}]],
'variable_name':['degC','pH']})
CodePudding user response:
Your desired data resides as a value of the value key in each row of the 'values' column. You can use str.get method to access those lists of dictionaries.
Then using iterrows method, iterate over the rows of df and cast these lists to pd.DataFrame (also use explode method to take qualifiers values out of the lists) and collect all data in a dictionary.
df['values'] = df['values'].str[0].str.get('value')
out = {}
for _, d in df.iterrows():
out[d['variable_name']] = pd.DataFrame(d['values']).explode('qualifiers')
Then for example, print(out['degC']) spits
value qualifiers date
0 9.3 P 2022-01-05
1 9.4 P 2022-01-05
and print(out['pH']):
value qualifiers date
0 7.5 P 2022-01-05
1 7.8 P 2022-01-05
CodePudding user response:
With data frame:
>>> df
values variable_name
0 [{'value': [{'value': '9.3', 'qualifiers': ['P'] ... degC
1 [{'value': [{'value': '7.5', 'qualifiers': ['P'] ... pH
Using this:
new_df = df['values'].apply(lambda x: pd.DataFrame(x[0]['value']))
new_df.index = df['variable_name']
>>> new_df.loc['degC']
value qualifiers date
0 9.3 [P] 2022-01-05
1 9.4 [P] 2022-01-05
>>> new_df.loc['pH']
value qualifiers date
0 7.5 [P] 2022-01-05
1 7.8 [P] 2022-01-05
