I am trying to replace all values in nat_locx with the value from the first row in LOCX and replace all the distance values with 0 if my first condition is met one or more times for id (my group_by() variable), but NOT if my second condition is met one or more times for id.
Here is an example of my data:
id DATE nat_locx LOCX distance loc_age condition
<fct> <date> <dbl> <dbl> <dbl> <dbl> <lgl>
6553 2004-06-27 13.5 2 487.90 26 TRUE
6553 2004-07-14 13.5 13.5 0 43 FALSE
6553 2004-07-15 13.5 12.5 30 44 FALSE
10160 2005-07-01 4.5 12 229.45588 36 TRUE
10160 2005-07-05 4.5 11 200.12496 40 TRUE
10160 2005-07-06 4.5 11 200.12496 41 TRUE
The way I have tried to do this is like so:
df<-df %>%
group_by(id) %>%
mutate(condition = case_when(
loc_age >= 25 & loc_age < 40 & distance > 30 ~ TRUE,
loc_age>=40 & loc_age<50 & distance>60 ~ TRUE,
TRUE ~ FALSE)) %>%
mutate(nat_locx=if(condition=="TRUE") {
first(LOCX) & distance==0.00
} else {
nat_locx})
The first mutate() results in a new column with TRUE and FALSE values. If there is even one instance of FALSE, then the if else statement I write afterwards should not proceed.
In this example, this would mean that for id==6553 the loop should not change anything. But, because condition==TRUE for every row for id==10160 then the if else should proceed.
Ideally, I'd like this output:
id DATE nat_locx LOCX distance loc_age condition
<fct> <date> <dbl> <dbl> <dbl> <dbl> <lgl>
6553 2004-06-27 13.5 2 487.90 26 TRUE
6553 2004-07-14 13.5 13.5 0 43 FALSE
6553 2004-07-15 13.5 12.5 30 44 FALSE
10160 2005-07-01 12 12 0 36 TRUE
10160 2005-07-05 12 11 0 40 TRUE
10160 2005-07-06 12 11 0 41 TRUE
A dplyr solution is preferred.
CodePudding user response:
As @Ben mentioned, we can include all so that the changes are only applied to groups that have all TRUE. We can use this for both nat_locx and for the distance columns.
library(tidyverse)
df %>%
group_by(id) %>%
mutate(
condition = case_when(
loc_age >= 25 & loc_age < 40 & distance > 30 ~ TRUE,
loc_age >= 40 & loc_age < 50 & distance > 60 ~ TRUE,
TRUE ~ FALSE
)
) %>%
mutate(nat_locx = if (all(condition)) first(LOCX) else nat_locx,
distance = if (all(condition)) 0 else distance)
Output
id DATE nat_locx LOCX distance loc_age condition
<int> <chr> <dbl> <dbl> <dbl> <int> <lgl>
1 6553 2004-06-27 13.5 2 488. 26 TRUE
2 6553 2004-07-14 13.5 13.5 0 43 FALSE
3 6553 2004-07-15 13.5 12.5 30 44 FALSE
4 10160 2005-07-01 12 12 0 36 TRUE
5 10160 2005-07-05 12 11 0 40 TRUE
6 10160 2005-07-06 12 11 0 41 TRUE
