How to select group based on the order of rows within group in R-CodePudding

For example, I have the following dataframe:

ID	variable	order
1	a	1
1	b	2
2	b	1
2	a	2
2	b	3
3	b	1
3	a	2

I would like to keep only the ID groups where "a" appears before "b" (i.e., the "order" of a is smaller than b). So the result would look something like this:

ID	variable	order
1	a	1
1	b	2
2	b	1
2	a	2
2	b	3

Where only ID 1 and 2 remain (with all of its original rows), and all rows in ID 3 are removed because the "order" of b is smaller than a. Would anyone have guidance on how this could be done in R?

CodePudding user response：

your_data %>%
  group_by(ID) %>%
  filter(any(variable == "a" & lead(variable, default = "not b") == "b"))

This ignores the order column and is based on the order of rows within each ID group. It check for the presence of an "a" on one row AND a "b" on the immediate next row.

In your comment you say '"a" right before "b"' - I went with that "right before" clarification. If a group has values "a", "c", "b" it would not be kept in my answer as the "a" is not "right before" the "b".