I have a problem with selecting data. I have this type of DataFrame. I wish to leave only rows with dates which appear more than 5 times in "date' column. I used filea.date.value_counts().loc[lambda s: s > 5]
to find which date should stay but I stucked what to do next with this. Could you help me?
CodePudding user response:
You can use a mask, such as:
good_dates = filea.date.value_counts().loc[lambda s: s > 5].index.tolist()
filtered_filea = filea[filea.data.isin(good_dates)]
CodePudding user response:
You can use groupby_transform:
out = df.loc[df.groupby('Date')['Date'].transform('size').loc[lambda x: x > 5].index]
print(out)
# Output
Date
3 2022-1-2
4 2022-1-2
5 2022-1-2
6 2022-1-2
7 2022-1-2
8 2022-1-2
Setup:
df = pd.DataFrame({'Date': ['2022-1-1']*3 ['2022-1-2']*6 ['2022-1-3']*4})
print(df)
# Output
Date
0 2022-1-1
1 2022-1-1
2 2022-1-1
3 2022-1-2
4 2022-1-2
5 2022-1-2
6 2022-1-2
7 2022-1-2
8 2022-1-2
9 2022-1-3
10 2022-1-3
11 2022-1-3
12 2022-1-3
