Pandas row number for group results-CodePudding

I am attempting to create a dataframe column that assigns a sequential number for each change in a combination of values. Each combination of student and term represents a group. My data looks like this.

student	year
A	20211
A	20222
A	20222
A	20225
B	20211
B	20211
B	20227
C	20211
C	20222
C	20229

And I want to assign values in a new column to indicate each unique student and year combination. I've tried sort_values with groupby and cumcount but I'm getting a sequence of rows not just when the year value changes. This is what I want

student	year	enrollment
A	20211	1
A	20222	2
A	20222	2
A	20225	3
B	20211	1
B	20211	1
B	20227	2
C	20211	1
C	20222	2
C	20229	3

CodePudding user response：

You can use pd.factorize per student group:

df['enrollment'] = df.groupby('student')['year'] \
                     .transform(lambda x: pd.factorize(x)[0]   1)
print(df)

# Output:
  student   year  enrollment
0       A  20211           1
1       A  20222           2
2       A  20222           2
3       A  20225           3
4       B  20211           1
5       B  20211           1
6       B  20227           2
7       C  20211           1
8       C  20222           2
9       C  20229           3