I have an unsorted vector of length N. Each element of the vector appears is present precisely twice (the vector length is an even number). I have a custom sorting algorithm, and the goal is to iterate until the vector achieves a state in which each element is adjacent to its copy.
Unsorted vector = {A,F,J,E,F,A,J,E}
A valid sorted state = {A,A,J,J,E,E,F,F}
Another valid sorted state = {J,J,A,A,F,F,E,E}
So my question lies in what is the fastest way to check if a sorted state is valid so that I can speed up my iterations? For long vectors, this will dictate most of my scaling ability.
CodePudding user response:
Something quick and dirty but I'm not sure it will always work:
all(duplicated(x) == c(FALSE,TRUE))
This is relying on the fact that the two same values will always be next to each other, one not-duplicated, the next duplicated. Seems to work with the test sets:
x <- c("A", "F", "J", "E", "F", "A", "J", "E")
s1 <- c("A", "A", "J", "J", "E", "E", "F", "F")
s2 <- c("J", "J", "A", "A", "F", "F", "E", "E")
all(duplicated(x) == c(FALSE,TRUE))
#[1] FALSE
all(duplicated(s1) == c(FALSE,TRUE))
#[1] TRUE
all(duplicated(s2) == c(FALSE,TRUE))
#[1] TRUE
And is pretty quick, looking through a million length vector in 5 hundredths of a second on my machine:
x <- rep(1:1e6, each=2)
system.time(all(duplicated(x) == c(FALSE,TRUE)))
# user system elapsed
# 0.04 0.00 0.05
CodePudding user response:
Another option is to use rle function:
v1 <- c("A", "F", "J", "E", "F", "A", "J", "E")
v2 <- c("A", "A", "J", "J", "E", "E", "F", "F")
v3 <- c("J", "J", "A", "A", "F", "F", "E", "E")
rle_obj <- rle(v3)
all(rle_obj$lengths > 1)
test:
> rle_obj <- rle(v1)
> all(rle_obj$lengths > 1)
[1] FALSE
> rle_obj <- rle(v2)
> all(rle_obj$lengths > 1)
[1] TRUE
> rle_obj <- rle(v3)
> all(rle_obj$lengths > 1)
[1] TRUE
>
CodePudding user response:
An option involves converting the vector (as the length is even and an element is present exactly twice) to a two-row matrix, get the uniqueand test whether the number of rows is 1. If values duplicated are adjacent, while adding the dim attributes with matrix, the second row will be exactly the same as the first
f1 <- function(x)
{
nrow(unique(matrix(x, nrow = 2))) == 1
}
-testing
> v1 <- c("A", "F", "J", "E", "F", "A", "J", "E")
> v2 <- c("A", "A", "J", "J", "E", "E", "F", "F")
> v3 <- c("J", "J", "A", "A", "F", "F", "E", "E")
> f1(v1)
[1] FALSE
> f1(v2)
[1] TRUE
> f1(v3)
[1] TRUE
Or slightly more faster
f2 <- function(x)
{
sum(duplicated(matrix(x, nrow = 2))) == 1
}
-testing
> f2(v1)
[1] FALSE
> f2(v2)
[1] TRUE
> f2(v3)
[1] TRUE
-benchmarks
#thelatemail
> f3 <- function(x) all(duplicated(x) == c(FALSE,TRUE))
#TarJae
> f4 <- function(x) {rle_obj <- rle(x); all(rle_obj$lengths > 1)}
> x1 <- rep(1:1e8, each = 2)
> system.time(f1(x1))
user system elapsed
2.649 0.456 3.111
> system.time(f2(x1))
user system elapsed
2.258 0.433 2.694
> system.time(f3(x1))
user system elapsed
9.972 1.272 11.233
> system.time(f4(x1))
user system elapsed
7.051 3.281 10.333
