I have the following data frame:
library(tidyverse)
v1 = c(1 , NA, 3, 5, NA, NA, 2, 12, NA, 5, NA, 0, 1, 2, 6, 8)
alt = rnorm(length(v1), 0, 1)
tb = tibble(v1, alt)
print(tb)
A tibble: 16 × 2
v1 alt
<dbl> <dbl>
1 1 0.495
2 NA 0.726
3 3 0.667
4 5 0.955
5 NA -1.68
6 NA -1.21
7 2 -1.96
8 12 1.47
9 NA 0.372
10 5 1.07
11 NA 0.531
12 0 0.102
13 1 1.34
14 2 0.0872
15 6 -0.391
16 8 -0.250
I need to fill NAs in v1 using the mutate. The idea is that when there is one NA in v1, it will be filled by the multiplication between the variable alt and the value of v1 prior to the NA.
I solve this using loop for, but this may take time to depend on the data set.
for (i in 1:length(v1)) {
if( is.na(tb[i, 'v1']) ){
tb[i, 'v1'] = tb[i-1, 'v1']*tb[i, 'alt']
}
}
This yields:
A tibble: 16 × 2
v1 alt
<dbl> <dbl>
1 1 0.495
2 0.726 0.726
3 3 0.667
4 5 0.955
5 -8.38 -1.68
6 10.1 -1.21
7 2 -1.96
8 12 1.47
9 4.47 0.372
10 5 1.07
11 2.65 0.531
12 0 0.102
13 1 1.34
14 2 0.0872
15 6 -0.391
16 8 -0.250
My question is: How do I fill NAs using my condition and mutate or others dplyr verbs ?
CodePudding user response:
Something closer to what you are looking for is:
tb %>%
mutate(v1 = unlist(accumulate2(v1, alt[-1], ~if(is.na(..2))..3*..1 else ..2)))
# A tibble: 16 x 2
v1 alt
<dbl> <dbl>
1 1 0.495
2 0.726 0.726
3 3 0.667
4 5 0.955
5 -8.4 -1.68
6 10.2 -1.21
7 2 -1.96
8 12 1.47
9 4.46 0.372
10 5 1.07
11 2.66 0.531
12 0 0.102
13 1 1.34
14 2 0.0872
15 6 -0.391
16 8 -0.25
CodePudding user response:
You could make use of cumprod:
tb %>%
group_by(id = cumsum(!is.na(v1))) %>%
mutate(v1 = v1[1]*cumprod(alt^is.na(v1)))
# A tibble: 16 x 3
# Groups: id [11]
v1 alt id
<dbl> <dbl> <int>
1 1 0.495 1
2 0.726 0.726 1
3 3 0.667 2
4 5 0.955 3
5 -8.4 -1.68 3
6 10.2 -1.21 3
7 2 -1.96 4
8 12 1.47 5
9 4.46 0.372 5
10 5 1.07 6
11 2.66 0.531 6
12 0 0.102 7
13 1 1.34 8
14 2 0.0872 9
15 6 -0.391 10
16 8 -0.25 11
In Base R:
prd <- function(a, b){
ifelse(is.na(a), prd(b * c(1,head(a,-1)),b), a)
}
transform(tb, v1 = prd(v1, alt))
v1 alt
1 1.000 0.4950
2 0.726 0.7260
3 3.000 0.6670
4 5.000 0.9550
5 -8.400 -1.6800
6 10.164 -1.2100
7 2.000 -1.9600
8 12.000 1.4700
9 4.464 0.3720
10 5.000 1.0700
11 2.655 0.5310
12 0.000 0.1020
13 1.000 1.3400
14 2.000 0.0872
15 6.000 -0.3910
16 8.000 -0.2500
CodePudding user response:
tb %>%
mutate(
v1 = coalesce(v1, lag(v1) * alt)
)
# # A tibble: 16 × 2
# v1 alt
# <dbl> <dbl>
# 1 1 0.223
# 2 0.377 0.377
# 3 3 0.469
# 4 5 -0.320
# 5 0.311 0.0623
# 6 NA -0.460
# 7 2 0.737
# 8 12 -0.946
# 9 -31.6 -2.63
# 10 5 -0.632
# 11 -0.134 -0.0268
# 12 0 0.329
# 13 1 -2.01
# 14 2 1.41
# 15 6 -1.22
# 16 8 -0.125
Note that I have taken you literally with
when there is one
NAinv1, it will be filled by the multiplication between the variablealtand the value ofv1prior to theNA.
As you can see above, when there are 2 NAs (rows 5 and 6), only the first one is filled.
