I have two dataframes like this:
| First | Second | Third | Fourth |
|---|---|---|---|
| 12345 | 23423 | 12342 | 43214 |
| 23453 | 22342 | 23421 | 43214 |
| 23453 | NA | NA | NA |
| Code | Type |
|---|---|
| 12345 | A |
| 23453 | B |
| 23423 | Other |
| 12342 | A |
I would like to add a fifth column to the first data frame based on the codes in each row. Using the second dataframe, if all the codes in a row are Type A, I would like to insert "A" into the fifth column.
If all the codes in a row are Type B, I would like to insert "B" in the fifth column.
If all the codes in a row are Type "Other", I would like to insert "Other".
If there are codes that are both Type A and B, I would like to insert "Both".
If there are codes that are both Type A, B, or A and B and Other, I would like to insert A, B, or Both, respectively.
I would also like to ignore all the NAs in a row.
Thank you so much!
CodePudding user response:
A bit (probably over-)complicated. First join all columns with their respective code, then unite columns and remove NA. Then check for unique values with the aid of map and finally add the rule for "both".
library(tidyverse)
# add row to show whether rule "Both" works
df <- df %>% add_row(First = 12345, Second = 23453)
df %>%
left_join(type_df, by = c("First" = "Code")) %>%
left_join(type_df, by = c("Second" = "Code")) %>%
left_join(type_df, by = c("Third" = "Code")) %>%
left_join(type_df, by = c("Fourth" = "Code")) %>%
tidyr::unite(contains("Type"), col = "type", na.rm = TRUE, sep = ",") %>%
mutate(type = stringr::str_split(type, ",") %>%
map(., ~ unique(.x)) %>%
map_chr(., ~ paste(.x, collapse = ", "))) %>%
mutate(type = case_when(
str_detect(type, "A") & str_detect(type, "B") ~ "Both",
TRUE ~ type
))
Output:
First Second Third Fourth type
1 12345 23423 12342 43214 A, Other
2 23453 22342 23421 43214 B
3 23453 NA NA NA B
4 12345 23453 NA NA Both
Data:
df <- read.table(text = "First Second Third Fourth
12345 23423 12342 43214
23453 22342 23421 43214
23453 NA NA NA", h=T)
type_df <- read.table(text = "Code Type
12345 A
23453 B
23423 Other
12342 A", h = T)
