I have a dataframe like this:
VAL1 VAL2
A A
B B
E E
F F
G G
H H
I I
J J
A B
A C
B A
B C
C A
C B
D E
E D
F E
E F
G H
H G
I J
J I
I H
H I
K K
And I would like to cluster into Groups the VAL1 and VAL2 values.
For instance :
Ais in the same row asBandC, so I groupA,BandCwithin the same group.Dis in the same row asEandEis in the same row asF, so I groupD,E, and Fwithin the same group.Gis in the same row asHandHis in the same row asI, and IIis in the same group asJ, so I groupG,H,I and Jwithin the same group.Khas nos shared row, so I group it alone.
and I should then get:
Groups VALs
G1 A
G1 B
G1 C
G2 D
G2 E
G2 F
G3 G
G3 H
G3 I
G3 J
G4 K
Here is the dataframe if it can help
{'VAL1': {0: 'A', 1: 'B', 2: 'E', 3: 'F', 4: 'G', 5: 'H', 6: 'I', 7: 'J', 8: 'A', 9: 'A', 10: 'B', 11: 'B', 12: 'C', 13: 'C', 14: 'D', 15: 'E', 16: 'F', 17: 'E', 18: 'G', 19: 'H', 20: 'I', 21: 'J', 22: 'I', 23: 'H', 24: 'K'}, 'VAL2': {0: 'A', 1: 'B', 2: 'E', 3: 'F', 4: 'G', 5: 'H ', 6: 'I', 7: 'J', 8: 'B', 9: 'C', 10: 'A', 11: 'C', 12: 'A ', 13: 'B', 14: 'E', 15: 'D', 16: 'E', 17: 'F', 18: 'H', 19: 'G', 20: 'J', 21: 'I', 22: 'H', 23: 'I', 24: 'K'}}
CodePudding user response:
Create connected_components for list L and then convert to DataFrame:
import networkx as nx
# Create the graph from the dataframe
g = nx.Graph()
g.add_edges_from(df[['VAL1','VAL2']].itertuples(index=False))
new = list(nx.connected_components(g))
L = [(f'G{cid 1}', node) for cid, component in enumerate(new) for node in component]
df = pd.DataFrame(L, columns=['Groups','VALSs'])
print (df)
Groups VALSs
0 G1 A
1 G1 B
2 G1 C
3 G2 D
4 G2 F
5 G2 E
6 G3 G
7 G3 I
8 G3 J
9 G3 H
10 G4 K
