Pandas: Count Higher Ranks For Current Experiment Participants In Later Experiments-CodePudding

Learning Experiments

In a series of learning experiments, I would like to count the number of participants in each experiment that improved their performance in subsequent experiments (Rank 1 is highest). In addition, I would also like to count the number of participants in each experiment that subsequently reached the top rank.

Here is a short, sanitized version of the learning experiment csv file that I have loaded into a pandas dataframe (df_learning).

Experiment	Subject	Rank
A	Alpha	1
A	Bravo	2
A	Charlie	3
A	Delta	4
A	Echo	5
B	Alpha	1
B	Charlie	2
B	Echo	3
B	Foxtrot	4
B	Golf	5
B	India	6
B	Juliet	7
C	Juliet	1
C	Bravo	2
C	Charlie	3

Please advise?

CodePudding user response：

You can use a groupby.cummax, then boolean indexing:

m = df['Rank'].sub(df.groupby('Subject')['Rank'].cummax()).lt(0)

improved_rank = df.loc[m, 'Subject'].unique()

output: ['Charlie', 'Echo', 'Juliet']

reached_top_rank = df.loc[m&df['Rank'].eq(1), 'Subject'].unique()

output: ['Juliet']