I have the following problem, I want to detect if 2 or more consecutive values in a column of a dataframe have a value greater than 0.5. For this I have chosen the following approach: I check each cell if the value is less than 0.5 and create an entry in the column "condition". (See table) Now I have the following problem, how can I detect in a column if 2 consecutive cells have the same value (row 4-5)? Or is it possible to detect the problem also directly in the Data column. If 2 consecutive cells are False, the dataframe can be discarded.
I would be very grateful for any help!
| data | condition | |
|---|---|---|
| 0 | 0.1 | True |
| 1 | 0.1 | True |
| 2 | 0.25 | True |
| 3 | 0.3 | True |
| 4 | 0.6 | False |
| 5 | 0.7 | False |
| 6 | 0.3 | True |
| 7 | 0.1 | True |
| 6 | 0.9 | False |
| 7 | 0.1 | True |
CodePudding user response:
You can compute a boolean series of values greater than 0.5 (i.e True when invalid). Then apply a boolean and (&) between this series and its shift. Any two consecutive True values will yield True. You can check if any is present to decide to discard the dataset:
s = df['data'].gt(0.5)
(s&s.shift()).any()
Output: True -> the dataset is invalid
CodePudding user response:
You can use the .diff method and check that it is equal to zero.
df['eq_to_prev'] = df.data.diff().eq(0)
