In Pandas I have a dataframe like below
data= [['A','B',3],['A','C',4],['A','D',5],['B','A',4],['B','C',4],['C','D',1]]
df = pd.DataFrame(data,columns =['Col1','Col2','Value'])
df
| Col1 | Col2 | Value |
|---|---|---|
| A | B | 3 |
| A | C | 4 |
| A | D | 5 |
| B | C | 4 |
| C | D | 1 |
| B | A | 4 |
I want to convert it as below
| A:B | A:C | A:D | B:C | C:D |
|---|---|---|---|---|
| 7 | 4 | `5 | 4 | 1 |
Note: first column A:B value is 7 because there exists combination (A:B) = 4 (B:A) = 3.
Please suggest a quick method
CodePudding user response:
Use sorted with join for both columns and aggregate sum, last transpose:
df1 = (df.groupby(df[['Col1','Col2']]
.agg(lambda x: ':'.join(sorted(x)), axis=1))
.sum()
.T
.reset_index(drop=True))
print (df1)
A:B A:C A:D B:C C:D
0 7 4 5 4 1
CodePudding user response:
You can do:
df.groupby([':'.join(sorted(t)) for t in zip(df['Col1'], df['Col2'
])])['Value'].sum().to_frame().T
output:
A:B A:C A:D B:C C:D
Value 7 4 5 4 1
CodePudding user response:
you can use sets to make order irrelevant or sorted to make sure you have consistent ordering
result = df['Value'].\
groupby(df[['Col1','Col2']].\
apply(set,axis=1).apply(':'.join)).sum()
print(result['B:A'])
print(result)
but that gets you a series ...
B:A 7
C:A 4
C:B 4
D:A 5
D:C 1
Name: Value, dtype: int64
if you want it the other way you need to make 2 small changes
result = df[['Value']].\
groupby(df[['Col1','Col2']].\
apply(set,axis=1).apply(':'.join)).sum().T
print(result)
print(result['B:A']['Value'])
note that set does not care about order so it might be A:B or B:A
if you need it actually sorted to always be 'A:B'(you should have specified that as part of your requirements) you will need to use sort instead of set as shown below
result = df['Value'].\
groupby(df[['Col1','Col2']].\
apply(sorted,axis=1).apply(':'.join)).sum()
print(result['A:B']) # always sorted...
print(result)
