Create a new column based on conditional count of other column grouped by id-CodePudding

I have a dataframe like this: My goal is to count the number of countries grouped by id but only (usa, canada, mex)

id	country	desired_output
a	usa	1
b	canada	2
b	canada	2
c	china	0
d	mexico	3
d	mexico	3
d	mexico	3

CodePudding user response：

You can do that with the following

df["desired_output"] = df.groupby(["id"])["id"].transform("count")
df.loc[~df['country'].isin(["usa","canada","mexico"]), 'desired_output'] = 0

CodePudding user response：

Use Series.isin with Groupby.transform and Series.fillna:

In [21]: df['desired_output'] = df[df.country.isin(["usa","canada","mexico"])].groupby('id')['country'].transform('size')

In [24]: df['desired_output'] = df['desired_output'].fillna(0).astype(int)

In [25]: df
Out[26]: 
  id country  desired_output
0  a     usa               1
1  b  canada               2
2  b  canada               2
3  c   china               0
4  d  mexico               3
5  d  mexico               3
6  d  mexico               3

CodePudding user response：

Convert id or some another column to NaN in Series.where and Series.isin and then count non NaNs rows by GroupBy.transform with GroupBy.count:

L = ["usa","canada","mexico"]
df['desired_output'] = df.id.where(df.country.isin(L)).groupby(df['id']).transform('count')
print (df)
  id country  desired_output
0  a     usa               1
1  b  canada               2
2  b  canada               2
3  c   china               0
4  d  mexico               3
5  d  mexico               3
6  d  mexico               3