Complex Concatenation on a pandas dataframe-CodePudding

I have a dataframe like this:

user_id | visit_order  | campaign 

1  | 1    |     C1
1  | 2    |     C2
1  | 3    |     C3
2  | 1    |     C2
2  | 2    |     C1

I want to create a column that takes the visit order at user_id level and gives an output like this:

user_id | visit_order  | campaign | campaign_order

1  | 1    |     C1  |  [C1]
1  | 2    |     C2  |  [C1, C2]
1  | 3    |     C3  |  [C1, C2, C3]
2  | 1    |     C2  |  [C2]
2  | 2    |     C1  |  [C2, C1]

Is it possible to do this? Any help is much appreciated :)

CodePudding user response：

You need to use a custom function to simulate exanding.

NB. the data must be sorted by "visit_order" first.

def expand(ser):
    l = []
    out = []
    for e in ser:
        l.append(e)
        out.append(l.copy())
    return pd.Series(out, index=ser.index)

df['campaign_order'] = df.groupby('user_id', group_keys=False)['campaign'].apply(expand)

output:

   user_id  visit_order campaign campaign_order
0        1            1       C1           [C1]
1        1            2       C2       [C1, C2]
2        1            3       C3   [C1, C2, C3]
3        2            1       C2           [C2]
4        2            2       C1       [C2, C1]

CodePudding user response：

Actualy this worked perfectly:

df_test.groupby(['user_id','campaign','visit_order'])['campaign'].agg([('a', ';'.join), ('b', lambda x: x.tolist())])