How do I create a repeatable random shuffle of date time data to calculate the mean diff time-CodePudding

I have a dataset of 1,000s of date times of events, event A and event B. I am looking to test if there is some dependence between them. To do so I wish to randomly shuffle the times in A and B, calculate the diff time between each observation i.e. A to B, then calculate the mean of all diff times. I wish to repeat this test 100s of times.

Im therefore looking for a loop or function rather than copy paste the code.


# the data frame is structured like this with many more observations

set.seed(10)

A <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 12)

B <- sample(seq(as.Date('2000/01/01'), as.Date('2010/01/01'), by="day"), 12)

df <- data.frame(A, B)

I have been able to generate the output needed as follows, but need to repeat this many time, i.e. have 100s of mean_shuffled results


shuffled_A = sample(df$A)
shuffled_B = sample(df$B)

df_shuffled <- data.frame(shuffled_A, shuffled_B)

df_shuffled$diff <- difftime(df_shuffled$shuffled_B, df_shuffled$shuffled_A)

mean_shuffled <- mean(df_shuffled$diff)

following @jblood94 comments the below has been added


# the data frame is structured like this with many more observations

set.seed(100)

A <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 120)

B <- A   2 # as I am testing that B is dependent on A, so B always takes place after A

df <- data.frame(A, B)

df = transform(df, C = sample(A), D = sample(B), E = sample(A), G = sample(B) ) # to create two shuffled diff times

df$diff <- difftime(df$B, df$A) # observed data
df$diff_shuffle1 <- abs(difftime(df$D, df$C, units = "days")) # A and B are at random times but I have added abs() as the diff time can be positive or negative
df$diff_shuffle2 <- abs(difftime(df$G, df$E, units = "days")) # A and B are at random times 2

mean(df$diff) # observed mean
mean(df$diff_shuffle1) # shuffled time difference between A and B is they happen at random times
mean(df$diff_shuffle2) # shuffled time difference between A and B is they happen at random times

CodePudding user response：

You can wrap what you've done in a for() loop for a given number of loops/simulations nsims and track each simulation sim as it loops around and add the result each to the output. Note the static data name, and the dynamic df in the loop.

set.seed(100)

A <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 120)
B <- A   2 # as I am testing that B is dependent on A, so B always takes place after A
data <- data.frame(A, B)

nsims <- 100
sim <- 1
output <- data.frame()

for(i in 1:nsims){
df = transform(data, C = sample(A), D = sample(B), E = sample(A), G = sample(B) ) # to create two shuffled diff times
df$diff <- difftime(df$B, df$A) # observed data
df$diff_shuffle1 <- abs(difftime(df$D, df$C, units = "days")) # A and B are at random times but I have added abs() as the diff time can be positive or negative
df$diff_shuffle2 <- abs(difftime(df$G, df$E, units = "days")) # A and B are at random times 2
obsM <- mean(df$diff) # observed mean
shuf1M <- mean(df$diff_shuffle1) # shuffled time difference between A and B is they happen at random times
shuf2M <- mean(df$diff_shuffle2) # shuffled time difference between A and B is they happen at random times
out <- data.frame(obsM,shuf1M,shuf2M,sim)
output <- rbind(output,out)
sim <- sim 1
}

output