Dataset
Let's say I have the following dataframe:
df <- tibble(ID = c("A", "A", "A", "B", "C", "C", "D", "D", "D", "D", "E", "E", "E"),
Encounter = c(10, 11, 12, 3, 5, 50, 8, 8, 15, 20, 2, 8, 10),
Item = c("apple", "toy", "bowl", "apple", "mango", "mango", "toy", "brush", "toy", "brush", "brush", "key", "key"))
# A tibble: 13 x 3
ID Encounter Item
<chr> <dbl> <chr>
1 A 10 apple
2 A 11 toy
3 A 12 bowl
4 B 3 apple
5 C 5 mango
6 C 50 mango
7 D 8 toy
8 D 8 brush
9 D 15 toy
10 D 20 brush
11 E 2 brush
12 E 8 key
13 E 10 key
Criteria
I wish to find out if the Item in the first Encounter appears in the subsequent Encounter.
For example, in
A, theItemin the firstEncounterisapple, which does not appear in subsequentEncountertherefore the output should beFALSE.For example, in
B,mangodoes appear in subsequentEncounter, therefore the output should beTRUEFor example, in
D, bothtoyandbrushare in the firstEncounter, and they both appears in the subsequentEncounter, therefore the output should beTRUEThe
Itemin the firstEncountershould always beFALSE.
Desired output
Here is my desired output for your better understanding:
# A tibble: 13 x 4
ID Encounter Item Output
<chr> <dbl> <chr> <lgl>
1 A 10 apple FALSE
2 A 11 toy FALSE
3 A 12 bowl FALSE
4 B 3 apple FALSE
5 C 5 mango FALSE
6 C 50 mango TRUE
7 D 8 toy FALSE
8 D 8 brush FALSE
9 D 15 toy TRUE
10 D 20 brush TRUE
11 E 2 brush FALSE
12 E 8 key FALSE
13 E 10 key FALSE
My attempt
I have used dplyr::case_when()
to set the row of min
EncountertoFALSE(successful)to set
Itemthat is NOT in the firstEncounter(successful)to set
Itemthat IS in the firstEncounter(FAILED if there are multipleItems in firstEncounter)
df %>% group_by(ID) %>%
arrange(ID, Encounter) %>%
mutate(Output = case_when(Encounter == min(Encounter) ~ F,
Item %in% first(Item) ~ T,
!(Item %in% first(Item)) ~ F))
# A tibble: 13 x 4
# Groups: ID [5]
ID Encounter Item Output
<chr> <dbl> <chr> <lgl>
1 A 10 apple FALSE
2 A 11 toy FALSE
3 A 12 bowl FALSE
4 B 3 apple FALSE
5 C 5 mango FALSE
6 C 50 mango TRUE
7 D 8 toy FALSE
8 D 8 brush FALSE
9 D 15 toy TRUE
10 D 20 brush FALSE
11 E 2 brush FALSE
12 E 8 key FALSE
13 E 10 key FALSE
Ultimate question
Is there any function that acts like dplyr::first(), but able to return multiple values that can be used in the case_when() function or ifelse()?
For example in D, I don't know how to output both toy and brush so that it can be compared using %in%.
Sorry for the long question, hope someone can help!
Also, feels like my case_when() expression is not written in a smart way, please feel free to leave a comment if you have suggestions! Thanks in advance!
CodePudding user response:
We may use duplicated - the values in 'Encounter' are already arranged, if not, do an arrange(ID, Encounter) before the group_by
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Output = first(Item) %in% Item[-1] & duplicated(Item)) %>%
ungroup
-output
# A tibble: 13 × 4
ID Encounter Item Output
<chr> <dbl> <chr> <lgl>
1 A 10 apple FALSE
2 A 11 toy FALSE
3 A 12 bowl FALSE
4 B 3 apple FALSE
5 C 5 mango FALSE
6 C 50 mango TRUE
7 D 8 toy FALSE
8 D 8 brush FALSE
9 D 15 toy TRUE
10 D 20 brush TRUE
11 E 2 brush FALSE
12 E 8 key FALSE
13 E 10 key FALSE
