I have a dataframe like this below,
A B C D
0 A1 Egypt 10 Yes
1 A1 Morocco 5 No
2 A2 Algeria 4 Yes
3 A3 Egypt 45 No
4 A3 Egypt 17 Yes
5 A3 Tunisia 4 Yes
6 A3 Algeria 32 No
7 A4 Tunisia 7 No
8 A5 Egypt 6 No
9 A5 Morocco 1 No
I want to get the count of yes and no from the column D wrt column B. The expected output needs to be in the lists like this below which can help to plot the multivariable chart.
Exected output:
yes = [1,2,0,1]
no = [1,2,2,1]
country = ['Algeria', 'Egypt', 'Morocco','Tunisia']
I am not sure how to achieve this from the above dataframe. Any help will be appreciated.
Here is the minimum reproducible dataframe sample:
import pandas as pd
df = pd.DataFrame({'A': {0: 'A1',
1: 'A1',
2: 'A2',
3: 'A3',
4: 'A3',
5: 'A3',
6: 'A3',
7: 'A4',
8: 'A5',
9: 'A5'},
'B': {0: 'Egypt',
1: 'Morocco',
2: 'Algeria',
3: 'Egypt',
4: 'Egypt',
5: 'Tunisia',
6: 'Algeria',
7: 'Tunisia',
8: 'Egypt',
9: 'Morocco'},
'C ': {0: 10, 1: 5, 2: 4, 3: 45, 4: 17, 5: 4, 6: 32, 7: 7, 8: 6, 9: 1},
'D': {0: 'Yes',
1: 'No',
2: 'Yes',
3: 'No',
4: 'Yes',
5: 'Yes',
6: 'No',
7: 'No',
8: 'No',
9: 'No'}}
)
CodePudding user response:
Create new columns by counting "yes", "no"; then groupby "B" and use sum on the newly created columns:
country, yes, no = df.assign(Yes=df['D']=='Yes', No=df['D']=='No').groupby('B')[['Yes','No']].sum().reset_index().T.to_numpy().tolist()
Output:
['Algeria', 'Egypt', 'Morocco', 'Tunisia']
[1, 2, 0, 1]
[1, 2, 2, 1]
CodePudding user response:
Use crosstab:
df1 = pd.crosstab(df.B, df.D)
print (df1)
D No Yes
B
Algeria 1 1
Egypt 2 2
Morocco 2 0
Tunisia 1 1
Then for plot use DataFrame.plot.bar
df1.plot.bar()
If need lists:
yes = df1['Yes'].tolist()
no = df1['No'].tolist()
country = df1.index.tolist()
