R: create a variable with conditions based upon unequal number of rows per condition-CodePudding

Here is an example of my data frame along with a column showing the kind of output that I'm aiming for:

eg.df <- tibble(subject = c(rep(1, 9), rep(2, 12), rep(3, 12), rep(4, 7), rep(5, 4)),
                
events = c(c("event1", "event2", "event3", "event4", "event5", "event6", "event7", "event8", "event9"),
c("event1", "event2", "eventx", "event3", "event4", "event5", "event6", "event7", "eventx",  "event8", "eventx", "event9"),
c("event1", "event2", "event3", "event4", "eventx", "eventx", "event5", "event6", "event7", "event8", "eventx", "event9"),
c("event1", "event2", "event3", "event4", "event5", "event6", "event7"),
c("event1", "event2", "event3", "event4")),

output_aimingfor = c(c(rep("A", 3), rep("B", 3), rep("C", 3)),
c(rep("A", 4), rep("B", 3), rep("C", 5)),
c(rep("A", 3), rep("B", 5), rep("C", 4)),
c(rep("A", 3), rep("B", 3), rep("C", 1)),
c(rep("A", 3), rep("B", 1))))

Basically, for every subject, they have undertaken a series of events, and these events can be subgrouped into three different types of event (A, B or C). Events 1, 2, and 3 go into group A, Events 4, 5, and 6 go into group B and Events 7, 8, and 9 go into group C. Along the way there are numerous unexpected events "eventx", which also go into the groups, thus the number of rows per group is uneven. How can I program the appropriate group for each event within each subject? Furthermore some subjects don't have complete groups, they may have the first instance of an event but not the last (as is the example here with subjects 4 and 5).

If anyone can help with this that would be amazing, I'm really wrecking my brain with this and can't even come up with a reasonable attempt!

CodePudding user response：

We may use parse_number to extract the numeric part and construct the index with %/% to replace the values

library(dplyr)
library(tidyr)
eg.df2 <-  eg.df %>% 
  group_by(subject) %>%
  mutate(new = c("A", "B", "C")[(readr::parse_number(events)-1) %/% 3   1]) %>%
  fill(new) %>%
  ungroup

-checking

> with(eg.df2, all.equal(new, output_aimingfor))
[1] TRUE

CodePudding user response：

I would create a mapping table and then look up the group value from the event.

Then you can fill down for the unexpected events.

library(dplyr)
library(tidyr)

# create this however you need to for your real data
event_map <- tibble(events = paste0("event", 1:9),
                    event_group = rep(LETTERS[1:3], each = 3))

eg.df %>% 
  left_join(event_map, by = "events") %>% 
  group_by(subject) %>% 
  fill(event_group)

CodePudding user response：

I'd use case_when with fill:

library(dplyr)
eg.df %>% 
  group_by(subject) %>% 
  mutate(new = case_when(events %in% c("event1", "event2", "event3") ~ "A",
                         events %in% c("event4", "event5", "event6") ~ "B",
                         events %in% c("event7", "event8", "event9") ~ "C")) %>% 
  fill(new)