A simple example to reproduce
d1 = structure(list(et = c("s", "s"), gg = c("d", "d"), hj = c("f",
"f"), ggh = c("h", "h"), wer = c(23L, 45L)), class = "data.frame", row.names = c(NA,
-2L))
where et, gg, hj and ggh are categorical variables and wer is a metric variable. So, for this category
et gg hj ggh
s d f h
the median (by wer) is 34.
There is a second dataset
d2 <- structure(list(et = "s", gg = "d", hj = "f", ggh = "h", wer = 3L), class = "data.frame", row.names = c(NA,
-1L))
for this category
et gg hj ggh
s d f h
wer equals 3
How to do that if in the dataset d2 the value wer for the same categories with d1, less or greater than the median from d1 for this category on 1, then in d2 put the value of the median in this category.
So in this simple example desired output in d2 will be
et gg hj ggh wer
s d f h 34
because 3 from the d2 dataset is less than 34 (the median for this category in d1) by 31.
Thank you for your help.
CodePudding user response:
You could calculate the median of d1 and then do a right_join on d2:
library(dplyr)
d1 %>%
group_by(across(-wer)) %>%
summarise(wer = median(wer), .groups = "drop") %>%
right_join(d2, by = c("et", "gg", "hj", "ggh"), suffix = c("", ".y")) %>%
mutate(wer = ifelse(wer >= wer.y, wer, wer.y), .keep = "unused")
This returns
# A tibble: 1 x 5
et gg hj ggh wer
<chr> <chr> <chr> <chr> <dbl>
1 s d f h 34
