I have a vector of 0s and 1s and want to identify the indices where a string of 0s is surrounded by 1s. If the number of 0s between the 1s is lower or equal than 5, I want to change these zeros to 1s.
Here is an example:
> x <- c(0,0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1,1,1)
In positions 7,8, and 9, I have only three zeros, and thus these need to be changed to 1. The other zeros are more than 5, and thus need not to be changed.
The resulting vector should look like this:
> x_converted <- c(0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1)
I am doing this with a for loop and if else statement, but I am sure there must be a faster way to do this.
Thank you.
CodePudding user response:
You can use rle() to get the runs. Then just change it based on the length of the run, excluding first run by looking at cumprod().
x_rle <- rle(x)
x_0 <- cumprod(x_rle$values == 0)
x_rev_0 <- rev(cumprod(rev(x_rle$values) == 0))
x_rle$values <- ifelse(
x_rle$lengths > 5 | x_0 | x_rev_0,
x_rle$values,
1
)
rep(x_rle$values, x_rle$lengths)
#> [1] 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
CodePudding user response:
A possible solution with rle which does not change shorts sequences of zero's at the beginning or end of x:
# create the run length encoding
r <- rle(x)
# create an index of which zero's should be changed
i <- r$values == 0 & r$lengths < 5 &
c(tail(r$values, -1) == 1, FALSE) &
c(FALSE, head(r$values, -1) == 1)
# set the appropriate values to 1
r$values[i] <- 1
# use the inverse of rle to recreate the vector
inverse.rle(r)
which gives:
[1] 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
CodePudding user response:
The rle() (run-length-encoding) function makes this pretty easy.
x <- c(0,0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1,1,1)
r <- rle(x)
## modify values appropriately
r$values[r$values==0 & r$lengths<=5] <- 1
## convert back to full vector
x_new <- rep(r$values, r$lengths)
## [1] 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
However, this still needs a little bit of adjustment for literal edge cases — this has converted the initial run of 3 zeros to 1. Perhaps
n <- length(r$values)
rv_int <- r$values[2:(n-1)]
rl_int <- r$lengths[2:(n-1)]
rv_int[rv_int == 0 &
rl_int <= 5] <- 1
x_new <- rep(c(r$values[1], rv_int, r$values[n]),
c(r$lengths[1], rl_int, r$lengths[n]))
CodePudding user response:
With data.table::rleid: rleid creates run-length type group id, which are used as grouping factor in ave. ave then performs a function over the groups defined by r.
r <- data.table::rleid(x)
# [1] 1 1 1 2 2 2 3 3 3 4 4 5 5 5 5 5 5 6 6 6 6
sub <- !r %in% c(1, max(r))
x[sub] <- ave(x[sub], r[sub], FUN = function(x) ifelse(length(x) <= 3 & x == 0, 1, x))
# [1] 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
CodePudding user response:
A different approach, based on converting x to a string and then back to a numeric vector:
library(tidyverse)
x <- c(0,0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1,1,1)
x %>% str_c(collapse = "") %>%
str_replace_all("(?<=1)0{1,5}(?=1)", \(x) str_dup("1", nchar(x))) %>%
str_split("") %>% flatten %>% as.numeric
#> [1] 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
