Goal: remove the biggest value (slowest time) from each subset (column Model) from Dataframe.
DataFrame:
Model Time
1 bert-base-uncased 6.570979
2 bert-base-uncased 11.570979
3 bert-base-uncased 6.788779
4 albert-large-v1 5.785576
5 albert-large-v1 5.603203
6 albert-large-v1 9.727373
Desired Dataframe:
Model Time
1 bert-base-uncased 6.570979
2 bert-base-uncased 6.788779
3 albert-large-v1 5.785576
4 albert-large-v1 5.603203
Code:
subsets = df['Model']
for s in subsets:
df[s]
Please let me know if there's anything else I can add to post.
CodePudding user response:
You can drop the max rows per group identified with idxmax:
df.drop(df.groupby('Model')['Time'].idxmax().values)
output:
Model Time
1 bert-base-uncased 6.570979
3 bert-base-uncased 6.788779
4 albert-large-v1 5.785576
5 albert-large-v1 5.603203
CodePudding user response:
Solution for remove all maximals values per groups by GroupBy.transform with max and compare for not equal by Series.ne, filter in boolean indexing:
print (df)
Model Time
1 bert-base-uncased 6.570979
2 bert-base-uncased 11.570979
3 bert-base-uncased 6.788779
2 bert-base-uncased 11.570979 <- added another maximum per bert-base-uncased
4 albert-large-v1 5.785576
5 albert-large-v1 5.603203
6 albert-large-v1 9.727373
df1 = df[df['Time'].ne(df.groupby('Model')['Time'].transform('max'))]
print (df1)
Model Time
1 bert-base-uncased 6.570979
3 bert-base-uncased 6.788779
4 albert-large-v1 5.785576
5 albert-large-v1 5.603203
If need remove only first maximum add condition:
df2 = (df[df['Time'].ne(df.groupby('Model')['Time'].transform('max')) |
df['Time'].duplicated()])
print (df2)
Model Time
1 bert-base-uncased 6.570979
3 bert-base-uncased 6.788779
2 bert-base-uncased 11.570979
4 albert-large-v1 5.785576
5 albert-large-v1 5.603203
CodePudding user response:
You can groupby "Model", and use idxmax method to find the index of the max time in each group. Then using Index.isin method, filter out the indices of these values.
max_time_indices = df.groupby('Model')['Time'].idxmax()
out = df[~df.index.isin(max_time_indices)]
Output:
Model Time
1 bert-base-uncased 6.570979
3 bert-base-uncased 6.788779
4 albert-large-v1 5.785576
5 albert-large-v1 5.603203
CodePudding user response:
Use a groupby and a slice:
df = df.groupby('Model').apply(lambda sub_df: sub_df.sort_values('Time').iloc[:-1]).reset_index(drop=True)
