I have a dataframe like this: My goal is to count the number of countries grouped by id but only (usa, canada, mex)
| id | country | desired_output |
|---|---|---|
| a | usa | 1 |
| b | canada | 2 |
| b | canada | 2 |
| c | china | 0 |
| d | mexico | 3 |
| d | mexico | 3 |
| d | mexico | 3 |
CodePudding user response:
You can do that with the following
df["desired_output"] = df.groupby(["id"])["id"].transform("count")
df.loc[~df['country'].isin(["usa","canada","mexico"]), 'desired_output'] = 0
CodePudding user response:
Use Series.isin with Groupby.transform and Series.fillna:
In [21]: df['desired_output'] = df[df.country.isin(["usa","canada","mexico"])].groupby('id')['country'].transform('size')
In [24]: df['desired_output'] = df['desired_output'].fillna(0).astype(int)
In [25]: df
Out[26]:
id country desired_output
0 a usa 1
1 b canada 2
2 b canada 2
3 c china 0
4 d mexico 3
5 d mexico 3
6 d mexico 3
CodePudding user response:
Convert id or some another column to NaN in Series.where and Series.isin and then count non NaNs rows by GroupBy.transform with GroupBy.count:
L = ["usa","canada","mexico"]
df['desired_output'] = df.id.where(df.country.isin(L)).groupby(df['id']).transform('count')
print (df)
id country desired_output
0 a usa 1
1 b canada 2
2 b canada 2
3 c china 0
4 d mexico 3
5 d mexico 3
6 d mexico 3
