How to select rows whose value appears more than x times in the table?-CodePudding

I have a problem with selecting data. I have this type of DataFrame. I wish to leave only rows with dates which appear more than 5 times in "date' column. I used filea.date.value_counts().loc[lambda s: s > 5] to find which date should stay but I stucked what to do next with this. Could you help me?

CodePudding user response：

You can use a mask, such as:

good_dates = filea.date.value_counts().loc[lambda s: s > 5].index.tolist()
filtered_filea = filea[filea.data.isin(good_dates)]

CodePudding user response：

You can use groupby_transform:

out = df.loc[df.groupby('Date')['Date'].transform('size').loc[lambda x: x > 5].index]
print(out)

# Output

       Date
3  2022-1-2
4  2022-1-2
5  2022-1-2
6  2022-1-2
7  2022-1-2
8  2022-1-2

Setup:

df = pd.DataFrame({'Date': ['2022-1-1']*3   ['2022-1-2']*6   ['2022-1-3']*4})
print(df)

# Output
        Date
0   2022-1-1
1   2022-1-1
2   2022-1-1
3   2022-1-2
4   2022-1-2
5   2022-1-2
6   2022-1-2
7   2022-1-2
8   2022-1-2
9   2022-1-3
10  2022-1-3
11  2022-1-3
12  2022-1-3