I have my Dataframe in the below structure. I would like to break them based on the nested values within the details column
cust_id, name, details
101, Kevin, [{"id":1001,"country":"US","state":"OH"}, {"id":1002,"country":"US","state":"GA"}]
102, Scott, [{"id":2001,"country":"US","state":"OH"}, {"id":2002,"country":"US","state":"GA"}]
Expected output
cust_id, name, id, country, state
101, Kevin, 1001, US, OH
101, Kevin, 1002, US, GA
102, Scott, 2001, US, OH
102, Scott, 2002, US, GA
CodePudding user response:
df = df.explode('details').reset_index(drop=True)
df = df.merge(pd.json_normalize(df['details']), left_index=True, right_index=True).drop('details', axis=1)
df.explode("details")basically duplicates each row in thedetailsN times, where N is the number of items in the array (if any) ofdetailsof that row- Since
explodeduplicates the rows, the original rows' indices (0 and 1) are copied to the new rows, so their indices are 0, 0, 1, 1, which messes up later processing.reset_index()creates a fresh new column for the index, starting at0.drop=Trueis used because by default pandas will keep the old index column; this removes it. pd.json_normalize(df['details'])converts the column (where each row contains a JSON object) to a new dataframe where each key unique of all the JSON objects is new columndf.merge()merges the new dataframe into the original oneleft_index=Trueandright_index=Truetells pandas to merge the specified dataframe starting from it's first, row into this dataframe, starting at its first row.drop('details', axis=1)gets rid of the olddetailscolumn containing the old objects
