I have a dataframe
df =
C1 C2
a. 2
d. 8
d. 5
d. 5
b. 3
b. 4
c. 5
a. 6
b. 7
I want to take all the rows, in which the count of the value in C1 is <= 2, and add a new col that is low, and keep the original value otherwise. So the new df will look like that:
df_new =
C1 C2 type
a. 2 low
d. 8 d
d. 5 d
d. 5 d
b. 3. b
b. 4 b
c. 5. low
a. 6. low
b. 7 b
How can I do this?
Thanks
CodePudding user response:
You can use pandas.DataFrame.groupby and count the value of 'C1' in each group. Then use lambda in pandas.DataFrame.transform and return low or the original value of the group. Or we can use numpy.where on the result of groupby.
df['type'] = df.groupby('C1')['C1'].transform(lambda g: 'low' if len(g)<=2 else g.iloc[0][:-1])
# Or we can use 'numpy.where' on the result of groupby
g = df.groupby('C1')['C1'].transform('size')
df['type'] = np.where(g<=2, 'low', df['C1'].str[:-1])
print(df)
Output:
C1 C2 type
0 a. 2 low
1 d. 8 d
2 d. 5 d
3 d. 5 d
4 b. 3 b
5 b. 4 b
6 c. 5 low
7 a. 6 low
8 b. 7 b
