How to identify one column with continuous number and same value of another column?-CodePudding

I have a DataFrame with two columns A and B.

I want to create a new column named C to identify the continuous A with the same B value.

Here's an example

import pandas as pd

df = pd.DataFrame({'A':[1,2,3,5,6,10,11,12,13,18], 'B':[1,1,2,2,3,3,3,3,4,4]})

I found a similar question, but that method only identifies the continuous A regardless of B.

df['C'] = df['A'].diff().ne(1).cumsum().sub(1)

I have tried to groupby B and apply the function like this:

df['C'] = df.groupby('B').apply(lambda x: x['A'].diff().ne(1).cumsum().sub(1))

However, it doesn't work: TypeError: incompatible index of inserted column with frame index.

The expected output is

CodePudding user response：

Let's create a sequential counter using groupby, diff and cumsum then factorize to reencode the counter

df['C'] = df.groupby('B')['A'].diff().ne(1).cumsum().factorize()[0]

Result

    A  B  C
0   1  1  0
1   2  1  0
2   3  2  1
3   5  2  2
4   6  3  3
5  10  3  4
6  11  3  4
7  12  3  4
8  13  4  5
9  18  4  6

CodePudding user response：

Use DataFrameGroupBy.diff with compare not equal 1 and Series.cumsum, last subtract 1:

df['C'] = df.groupby('B')['A'].diff().ne(1).cumsum().sub(1)
print (df)
    A  B  C
0   1  1  0
1   2  1  0
2   3  2  1
3   5  2  2
4   6  3  3
5  10  3  4
6  11  3  4
7  12  3  4
8  13  4  5
9  18  4  6