Learning Experiments
In a series of learning experiments, I would like to count the number of participants in each experiment that improved their performance in subsequent experiments (Rank 1 is highest). In addition, I would also like to count the number of participants in each experiment that subsequently reached the top rank.
Here is a short, sanitized version of the learning experiment csv file that I have loaded into a pandas dataframe (df_learning).
| Experiment | Subject | Rank |
|---|---|---|
| A | Alpha | 1 |
| A | Bravo | 2 |
| A | Charlie | 3 |
| A | Delta | 4 |
| A | Echo | 5 |
| B | Alpha | 1 |
| B | Charlie | 2 |
| B | Echo | 3 |
| B | Foxtrot | 4 |
| B | Golf | 5 |
| B | India | 6 |
| B | Juliet | 7 |
| C | Juliet | 1 |
| C | Bravo | 2 |
| C | Charlie | 3 |
Please advise?
CodePudding user response:
You can use a groupby.cummax, then boolean indexing:
m = df['Rank'].sub(df.groupby('Subject')['Rank'].cummax()).lt(0)
improved_rank = df.loc[m, 'Subject'].unique()
output: ['Charlie', 'Echo', 'Juliet']
reached_top_rank = df.loc[m&df['Rank'].eq(1), 'Subject'].unique()
output: ['Juliet']
