Grouping dataframe into list-CodePudding

I have a pandas dataframe like:

**CATEGORY  USER**
A   Alfa
A   Bravo
B   Charlie
B   Delta
C   Delta
D   Echo
C   Foxtrot
E   Foxtrot
D   Golf
F   Hotel
F   India
G   Juliett
H   Kilo

I want to group by the first column and get second column as lists in rows:

CATEGORY    USER    RELATION Lv. 1      *RELATION Lv. 2***
A           Alfa    [Alfa, Bravo]       [Alfa, Bravo]
A           Bravo   [Alfa, Bravo]       [Alfa, Bravo]
B           Charlie [Charlie, Delta]    [Charlie, Delta, Foxtrot]
B           Delta   [Charlie, Delta]    [Charlie, Delta, Foxtrot]
C           Delta   [Delta, Foxtrot]    [Charlie, Delta, Foxtrot]
D           Echo    [Echo, Golf]        [Echo, Golf]
C           Foxtrot [Delta, Foxtrot]    [Charlie, Delta, Foxtrot]
E           Foxtrot [Foxtrot]           [Charlie, Delta, Foxtrot]
D           Golf    [Echo, Golf]        [Echo, Golf]
F           Hotel   [Hotel, India]      [Hotel, India]
F           India   [Hotel, India]      [Hotel, India]
G           Juliett [Juliett]           [Juliett]
H           Kilo    [Kilo]              [Kilo]

Is it possible to do something like this using pandas groupby?

CodePudding user response：

We can agg to the list then reindex assign it back

df['new'] = df.groupby('CATEGORY')['USER'].agg(list).reindex(df.CATEGORY).tolist()
df
   CATEGORY     USER               new
0         A     Alfa     [Alfa, Bravo]
1         A    Bravo     [Alfa, Bravo]
2         B  Charlie  [Charlie, Delta]
3         B    Delta  [Charlie, Delta]
4         C    Delta  [Delta, Foxtrot]
5         D     Echo      [Echo, Golf]
6         C  Foxtrot  [Delta, Foxtrot]
7         E  Foxtrot         [Foxtrot]
8         D     Golf      [Echo, Golf]
9         F    Hotel    [Hotel, India]
10        F    India    [Hotel, India]
11        G  Juliett         [Juliett]
12        H     Kilo            [Kilo]

CodePudding user response：

Here's one approach:

(i) groupby "CATEGORY" and create sets of "USER"s for each category

(ii) Using a double loop, create sets of second order relations using set.intersection and iteratively update dictionary d using dict.setdefault

(iii) Convert the sets to lists and map to "CATEGORY" column to match df rows.

sets = df.groupby('CATEGORY')['USER'].apply(set)

d = {}
for k, st1 in zip(sets.index.tolist(), sets):
    for st2 in sets:
        if st1.intersection(st2):
            d.setdefault(k, set()).update(st2)

df['RELATION Lv. 1'] = [list(x) for x in df['CATEGORY'].map(sets)]
df['RELATION Lv. 2'] = [list(x) for x in df['CATEGORY'].map(d)]

Output:

   CATEGORY     USER    RELATION Lv. 1             RELATION Lv. 2
0         A     Alfa     [Alfa, Bravo]              [Alfa, Bravo]
1         A    Bravo     [Alfa, Bravo]              [Alfa, Bravo]
2         B  Charlie  [Delta, Charlie]  [Foxtrot, Delta, Charlie]
3         B    Delta  [Delta, Charlie]  [Foxtrot, Delta, Charlie]
4         C    Delta  [Foxtrot, Delta]  [Foxtrot, Delta, Charlie]
5         D     Echo      [Echo, Golf]               [Echo, Golf]
6         C  Foxtrot  [Foxtrot, Delta]  [Foxtrot, Delta, Charlie]
7         E  Foxtrot         [Foxtrot]           [Foxtrot, Delta]
8         D     Golf      [Echo, Golf]               [Echo, Golf]
9         F    Hotel    [Hotel, India]             [Hotel, India]
10        F    India    [Hotel, India]             [Hotel, India]
11        G  Juliett         [Juliett]                  [Juliett]
12        H     Kilo            [Kilo]                     [Kilo]