Home > OS >  How to parse data to columns from 2 columns with lists
How to parse data to columns from 2 columns with lists

Time:01-31

I've got the data that looks like that enter image description here

The first column - event name, second - user id, third - columns names, fourth - columns values. For different event names, I've got a different number of columns to parse.

I need the data to look like that:

event name user id column 1(dialog_id) ... column n (tutorial_id) ...
Dialogs 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 EcORCTz ... None ...
Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 None ... LjzXlwB ...

Some of the columns could repeat in the different event_names.

Earlier I used something like this:

    df['param_keys'] = df['param_keys'].apply(lambda x: ast.literal_eval(x))
    df['param_values'] = df['param_values'].apply(lambda x: ast.literal_eval(x))
    return(df)

And it worked perfectly, but now it returns just the same dataframe.

I've attached the example of data to try.

Thanks a lot!

CodePudding user response:

Just create a new dataframe from df['param_values'].tolist() and use pd.concat to join it and the original dataframe together:

# Convert the lists to *actual* lists, instead of just strings
df[['param_keys', 'param_values']] = df[['param_keys', 'param_values']].apply(lambda col: col.apply(ast.literal_eval))

# Process
df = pd.concat([df.drop(['param_values', 'param_keys'], axis=1), pd.DataFrame(df['param_values'].tolist(), columns=df.loc[0, 'param_keys'])], axis=1)

Output:

>>> df
        event_name                                  idfa               user_event_time tutorial_id            tutorial_name step       step_name
0    Tutorial_step  01f0548f-5bc6-460b-a702-cfe0aa4b90b8  2021-12-10T00:00:44.000 0000     K5MDyLy   C_001_Lighthouse_Start    1    ShowCutscene
1    Tutorial_step  01f0548f-5bc6-460b-a702-cfe0aa4b90b8  2021-12-10T00:00:44.000 0000     K5MDyLy   C_001_Lighthouse_Start    0     ToggleInput
2    Tutorial_step  01f0548f-5bc6-460b-a702-cfe0aa4b90b8  2021-12-10T00:00:48.000 0000     K5MDyLy   C_001_Lighthouse_Start    2     ToggleInput
3    Tutorial_step  01f0548f-5bc6-460b-a702-cfe0aa4b90b8  2021-12-10T00:00:52.000 0000     eIdJreF  T_Map_001_Lighthouse_01    0     ToggleInput
4    Tutorial_step  01f0548f-5bc6-460b-a702-cfe0aa4b90b8  2021-12-10T00:00:52.000 0000     eIdJreF  T_Map_001_Lighthouse_01    2           Delay
..             ...                                   ...                           ...         ...                      ...  ...             ...
995  Tutorial_step  d6949452-3bf2-4599-ac48-7a3df334da98  2021-12-11T07:40:48.000 0000     uC954aC       T_Map_003_Diner_01    2     TapBuilding
996  Tutorial_step  d6949452-3bf2-4599-ac48-7a3df334da98  2021-12-11T07:40:53.000 0000     uC954aC       T_Map_003_Diner_01    4  UnlockBuilding
997  Tutorial_step  d6949452-3bf2-4599-ac48-7a3df334da98  2021-12-11T07:40:53.000 0000     uC954aC       T_Map_003_Diner_01    3     ToggleInput
998  Tutorial_step  d6949452-3bf2-4599-ac48-7a3df334da98  2021-12-11T07:41:08.000 0000     QBK4BoO     T_Merge_010_Diner_01    3     ToggleInput
999  Tutorial_step  d6949452-3bf2-4599-ac48-7a3df334da98  2021-12-11T07:41:08.000 0000     QBK4BoO     T_Merge_010_Diner_01    2      MergeItems

CodePudding user response:

This can be achieved using pivot_table. Consider the code below:

df = pd.read_csv('Sample.csv')
df['idx1'] = df.index
df['param_keys'] = df['param_keys'].apply(lambda x: ast.literal_eval(x))
df['param_values'] = df['param_values'].apply(lambda x: ast.literal_eval(x))

#explodes param_keys and param_values list columns in to separate rows.
df = df.explode(['param_keys','param_values'])

#converting rows to columns
df = df.pivot_table(index=['idx1', 'event_name', 'idfa', 'user_event_time'],
   columns=['param_keys'],values='param_values',
   aggfunc=lambda x: ''.join(x)).fillna(0)
print(df)

What you get for the first ten rows of your csv file is:

param_keys                                                                           step        step_name tutorial_id              tutorial_name
idx1 event_name    idfa                                 user_event_time                                                                          
0    Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:44.000 0000    1     ShowCutscene     K5MDyLy     C_001_Lighthouse_Start
1    Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:48.000 0000    2      ToggleInput     K5MDyLy     C_001_Lighthouse_Start
2    Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:52.000 0000    0      ToggleInput     eIdJreF    T_Map_001_Lighthouse_01
3    Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:52.000 0000    2            Delay     eIdJreF    T_Map_001_Lighthouse_01
4    Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:52.000 0000    1  SetBuildingBase     eIdJreF    T_Map_001_Lighthouse_01
5    Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:56.000 0000    2       MergeItems     LjzXIwB  T_Merge_001_Lighthouse_01
6    Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:56.000 0000    1            Delay     LjzXIwB  T_Merge_001_Lighthouse_01
7    Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:56.000 0000    1      TapBuilding     vQXKqlS   T_Map_002_Lighthouse_02b
8    Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:56.000 0000    0      ToggleInput     vQXKqlS   T_Map_002_Lighthouse_02b
9    Tutorial_step 01f0548f-5bc6-460b-a702-cfe0aa4b90b8 2021-12-10T00:00:56.000 0000    3      ToggleInput     eIdJreF    T_Map_001_Lighthouse_01

  •  Tags:  
  • Related