For example, I have the following dataframe:
| ID | variable | order |
|---|---|---|
| 1 | a | 1 |
| 1 | b | 2 |
| 2 | b | 1 |
| 2 | a | 2 |
| 2 | b | 3 |
| 3 | b | 1 |
| 3 | a | 2 |
I would like to keep only the ID groups where "a" appears before "b" (i.e., the "order" of a is smaller than b). So the result would look something like this:
| ID | variable | order |
|---|---|---|
| 1 | a | 1 |
| 1 | b | 2 |
| 2 | b | 1 |
| 2 | a | 2 |
| 2 | b | 3 |
Where only ID 1 and 2 remain (with all of its original rows), and all rows in ID 3 are removed because the "order" of b is smaller than a. Would anyone have guidance on how this could be done in R?
CodePudding user response:
your_data %>%
group_by(ID) %>%
filter(any(variable == "a" & lead(variable, default = "not b") == "b"))
This ignores the order column and is based on the order of rows within each ID group. It check for the presence of an "a" on one row AND a "b" on the immediate next row.
In your comment you say '"a" right before "b"' - I went with that "right before" clarification. If a group has values "a", "c", "b" it would not be kept in my answer as the "a" is not "right before" the "b".
