I have a data frame like that :
| Index | Time | Id |
|---|---|---|
| 0 | 10:10:00 | 11 |
| 1 | 10:10:01 | 12 |
| 2 | 10:10:02 | 12 |
| 3 | 10:10:04 | 12 |
| 4 | 10:10:06 | 13 |
| 5 | 10:10:07 | 13 |
| 6 | 10:10:08 | 11 |
| 7 | 10:10:10 | 11 |
| 8 | 10:10:12 | 11 |
| 9 | 10:10:14 | 13 |
I want to compare id column for each pairs. So between the row 0 and 1, between the row 2 and 3 etc.
In others words I want to compare even rows with odd rows and keep same id pairs rows.
My ideal output would be :
| Index | Time | Id |
|---|---|---|
| 2 | 10:10:02 | 12 |
| 3 | 10:10:04 | 12 |
| 4 | 10:10:06 | 13 |
| 5 | 10:10:07 | 13 |
| 6 | 10:10:08 | 11 |
| 7 | 10:10:10 | 11 |
I tried that but it did not work :
df = df[
df[::2]["id"] ==df[1::2]["id"]
]
CodePudding user response:
You can use a GroupBy.transform approach:
# for each pair, is there only one kind of Id?
out = df[df.groupby(np.arange(len(df))//2)['Id'].transform('nunique').eq(1)]
Or, more efficient, using the underlying numpy array:
# convert to numpy
a = df['Id'].to_numpy()
# are the odds equal to evens?
out = df[np.repeat((a[::2]==a[1::2]), 2)]
output:
Index Time Id
2 2 10:10:02 12
3 3 10:10:04 12
4 4 10:10:06 13
5 5 10:10:07 13
6 6 10:10:08 11
7 7 10:10:10 11
