I've got the data that looks like that

The first column - event name, second - user id, third - columns names, fourth - columns values. For different event names, I've got a different number of columns to parse.
I need the data to look like that:
| event name | user id | column 1(dialog_id) | ... | column n (tutorial_id) | ... |
|---|---|---|---|---|---|
| Dialogs | 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 | EcORCTz | ... | None | ... |
| Tutorial_step | 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 | None | ... | LjzXlwB | ... |
Some of the columns could repeat in the different event_names.
Earlier I used something like this:
df['param_keys'] = df['param_keys'].apply(lambda x: ast.literal_eval(x))
df['param_values'] = df['param_values'].apply(lambda x: ast.literal_eval(x))
return(df)
And it worked perfectly, but now it returns just the same dataframe.
I've attached the example of data to try.
Thanks a lot!
CodePudding user response:
Just create a new dataframe from df['param_values'].tolist() and use pd.concat to join it and the original dataframe together:
# Convert the lists to *actual* lists, instead of just strings
df[['param_keys', 'param_values']] = df[['param_keys', 'param_values']].apply(lambda col: col.apply(ast.literal_eval))
# Process
df = pd.concat([df.drop(['param_values', 'param_keys'], axis=1), pd.DataFrame(df['param_values'].tolist(), columns=df.loc[0, 'param_keys'])], axis=1)
Output:
>>> df
event_name idfa user_event_time tutorial_id tutorial_name step step_name
0 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:44.000 0000 K5MDyLy C_001_Lighthouse_Start 1 ShowCutscene
1 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:44.000 0000 K5MDyLy C_001_Lighthouse_Start 0 ToggleInput
2 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:48.000 0000 K5MDyLy C_001_Lighthouse_Start 2 ToggleInput
3 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:52.000 0000 eIdJreF T_Map_001_Lighthouse_01 0 ToggleInput
4 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:52.000 0000 eIdJreF T_Map_001_Lighthouse_01 2 Delay
.. ... ... ... ... ... ... ...
995 Tutorial_step d6949452-3bf2-4599-ac48-7a3df334da98 2021-12-11T07:40:48.000 0000 uC954aC T_Map_003_Diner_01 2 TapBuilding
996 Tutorial_step d6949452-3bf2-4599-ac48-7a3df334da98 2021-12-11T07:40:53.000 0000 uC954aC T_Map_003_Diner_01 4 UnlockBuilding
997 Tutorial_step d6949452-3bf2-4599-ac48-7a3df334da98 2021-12-11T07:40:53.000 0000 uC954aC T_Map_003_Diner_01 3 ToggleInput
998 Tutorial_step d6949452-3bf2-4599-ac48-7a3df334da98 2021-12-11T07:41:08.000 0000 QBK4BoO T_Merge_010_Diner_01 3 ToggleInput
999 Tutorial_step d6949452-3bf2-4599-ac48-7a3df334da98 2021-12-11T07:41:08.000 0000 QBK4BoO T_Merge_010_Diner_01 2 MergeItems
CodePudding user response:
This can be achieved using pivot_table. Consider the code below:
df = pd.read_csv('Sample.csv')
df['idx1'] = df.index
df['param_keys'] = df['param_keys'].apply(lambda x: ast.literal_eval(x))
df['param_values'] = df['param_values'].apply(lambda x: ast.literal_eval(x))
#explodes param_keys and param_values list columns in to separate rows.
df = df.explode(['param_keys','param_values'])
#converting rows to columns
df = df.pivot_table(index=['idx1', 'event_name', 'idfa', 'user_event_time'],
columns=['param_keys'],values='param_values',
aggfunc=lambda x: ''.join(x)).fillna(0)
print(df)
What you get for the first ten rows of your csv file is:
param_keys step step_name tutorial_id tutorial_name
idx1 event_name idfa user_event_time
0 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:44.000 0000 1 ShowCutscene K5MDyLy C_001_Lighthouse_Start
1 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:48.000 0000 2 ToggleInput K5MDyLy C_001_Lighthouse_Start
2 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:52.000 0000 0 ToggleInput eIdJreF T_Map_001_Lighthouse_01
3 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:52.000 0000 2 Delay eIdJreF T_Map_001_Lighthouse_01
4 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:52.000 0000 1 SetBuildingBase eIdJreF T_Map_001_Lighthouse_01
5 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:56.000 0000 2 MergeItems LjzXIwB T_Merge_001_Lighthouse_01
6 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:56.000 0000 1 Delay LjzXIwB T_Merge_001_Lighthouse_01
7 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:56.000 0000 1 TapBuilding vQXKqlS T_Map_002_Lighthouse_02b
8 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:56.000 0000 0 ToggleInput vQXKqlS T_Map_002_Lighthouse_02b
9 Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:56.000 0000 3 ToggleInput eIdJreF T_Map_001_Lighthouse_01
