I have a python dataframe with columns, 'Expected' vs 'Actual' that shows a product (A,B,C or D) for each record
| ID | Expected | Actual |
|---|---|---|
| 1 | A | B |
| 2 | A | A |
| 3 | C | B |
| 4 | B | D |
| 5 | C | D |
| 6 | A | A |
| 7 | B | B |
| 8 | A | D |
I want to get a count from both columns for each unique value found in both columns (both columns dont share all the same products). So the result should look like this,
| Value | Expected | Actual |
|---|---|---|
| A | 4 | 2 |
| B | 2 | 3 |
| C | 2 | 0 |
| D | 0 | 3 |
Thank you for all your help
CodePudding user response:
I would do it following way
import pandas as pd
df = pd.DataFrame({'Expected':['A','A','C','B','C','A','B','A'],'Actual':['B','A','B','D','D','A','B','D']})
ecnt = df['Expected'].value_counts()
acnt = df['Actual'].value_counts()
known = sorted(set(df['Expected']).union(df['Actual']))
cntdf = pd.DataFrame({'Value':known,'Expected':[ecnt.get(k,0) for k in known],'Actual':[acnt.get(k,0) for k in known]})
print(cntdf)
output
Value Expected Actual
0 A 4 2
1 B 2 3
2 C 2 0
3 D 0 3
Explanation: main idea here is having separate value counts for Expected column and Actual column. If you wish to rather have Value as Index of your pandas.DataFrame you can do
...
cntdf = pd.DataFrame([acnt,ecnt]).T.fillna(0)
print(cntdf)
output
Actual Expected
D 3.0 0.0
B 3.0 2.0
A 2.0 4.0
C 0.0 2.0
CodePudding user response:
You can use apply and value_counts
df = pd.DataFrame({'Expected':['A','A','C','B','C','A','B','A'],'Actual':['B','A','B','D','D','A','B','D']})
df.apply(pd.Series.value_counts).fillna(0)
output:
Expected Actual
A 4.0 2.0
B 2.0 3.0
C 2.0 0.0
D 0.0 3.0
CodePudding user response:
first value_counts then create dataframe with two value_counts like below:
>>> df = pd.DataFrame({'Expected': ['B','B','C','D'],'Actual': ['A','A','C','D']})
>>> Expected = df['Expected'].value_counts()
>>> Actual = df['Actual'].value_counts()
>>> pd.DataFrame({'value':list(set(df['Expected'].to_list() df['Actual'].to_list())), 'Expected': Expected, 'Actual':Actual}).fillna(0).drop('value', axis=1)
Expected Actual
A 0.0 2.0
B 2.0 0.0
C 1.0 1.0
D 1.0 1.0
