I have a dataframe with the following data:
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'B', 'C', 'C'],
'col2': ['action1', 'action2', 'action1', 'action3', 'action1', 'action2']})
which looks like
col1, col2
A , action1
A , action2
B , action1
B , action3
C , action1
C , action2
Now since A and C both have action1 and action2. Group them together. B will be a separate group. So i want to generate a data frame below:
col1, col2
A, C, action1, action2
B , action1, action3
How can I achieve this?
CodePudding user response:
If ordering per groups is same in col2 is possible aggregate join per col1 and then per joined columns:
df = df.groupby('col1')['col2'].agg(', '.join).reset_index()
df = df.groupby('col2')['col1'].agg(', '.join).reset_index()[['col1','col2']]
print (df)
col1 col2
0 A, C action1, action2
1 B action1, action3
Or if ordering should be different use frozenset:
df = df.groupby('col1')['col2'].agg(frozenset).reset_index()
df = (df.groupby('col2')['col1']
.agg(', '.join)
.rename(lambda x: ', '.join(x)).reset_index()[['col1','col2']])
print (df)
col1 col2
0 A, C action2, action1
1 B action1, action3
print (df)
col1 col2
0 A action1
1 A action2
2 B action1
3 B action3
4 C action2 <-changed order
5 C action1 <-changed order
df = df.groupby('col1')['col2'].agg(frozenset).reset_index()
df = (df.groupby('col2')['col1']
.agg(', '.join)
.rename(lambda x: ', '.join(x)).reset_index()[['col1','col2']])
print (df)
col1 col2
0 A, C action2, action1
1 B action1, action3
CodePudding user response:
Use groupby twice. Once to join col2 by col1 and then later to join col1 by the joined col2.
srs = df.groupby('col1')['col2'].apply(','.join)
srs = pd.Series(srs.index, index=srs)
out = srs.groupby(level=0).apply(','.join)
out = pd.Series(out.index, index=out).to_frame().reset_index()
Output:
col1 col2
0 A,C action1,action2
1 B action1,action3
