Home > database >  Given a dataframe with one column of players and other column with a subset of teammates, form the e
Given a dataframe with one column of players and other column with a subset of teammates, form the e

Time:01-22

Suppose I have a dataframe like this

    player  teammates
0   A       [C,F]
1   C       [A,F]
2   B       [B]
3   D       [H,J,K]
4   H       [J,K]
5   Q       [D]

Now rows 3, 4 and 5 represent some challenging data points. If the teammates column contained the entire team for each player, the problem would be trivial.

The expected output would be a list of all teams, so like:

[[A,C,F], [B], [D,H,J,K,Q]]

The first step could be to just consolidate both columns into one via

df.apply(lambda row: list(set([row['player']] row['teammates'])), axis=1), like so

0  [A,C,F]
1  [A,C,F]
2  [B]
3  [D,H,J,K]
4  [H,J,K]
5  [Q,D]

but checking pairwise for common elements and further consolidating seems very inefficient. Is there an efficient way to get the desired output?

CodePudding user response:

Create connected_components with explode column teammates by DataFrame.explode:

import networkx as nx

# Create the graph from the dataframe
g = nx.Graph()

g.add_edges_from(df[['player','teammates']].explode('teammates').itertuples(index=False))

new = list(nx.connected_components(g))
print (new)
[{'F', 'A', 'C'}, {'B'}, {'Q', 'K', 'H', 'J', 'D'}]

If need lists:

L = [list(x) for x in new]
  •  Tags:  
  • Related