I have a dataset of 1,000s of date times of events, event A and event B. I am looking to test if there is some dependence between them. To do so I wish to randomly shuffle the times in A and B, calculate the diff time between each observation i.e. A to B, then calculate the mean of all diff times. I wish to repeat this test 100s of times.
Im therefore looking for a loop or function rather than copy paste the code.
# the data frame is structured like this with many more observations
set.seed(10)
A <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 12)
B <- sample(seq(as.Date('2000/01/01'), as.Date('2010/01/01'), by="day"), 12)
df <- data.frame(A, B)
I have been able to generate the output needed as follows, but need to repeat this many time, i.e. have 100s of mean_shuffled results
shuffled_A = sample(df$A)
shuffled_B = sample(df$B)
df_shuffled <- data.frame(shuffled_A, shuffled_B)
df_shuffled$diff <- difftime(df_shuffled$shuffled_B, df_shuffled$shuffled_A)
mean_shuffled <- mean(df_shuffled$diff)
following @jblood94 comments the below has been added
# the data frame is structured like this with many more observations
set.seed(100)
A <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 120)
B <- A 2 # as I am testing that B is dependent on A, so B always takes place after A
df <- data.frame(A, B)
df = transform(df, C = sample(A), D = sample(B), E = sample(A), G = sample(B) ) # to create two shuffled diff times
df$diff <- difftime(df$B, df$A) # observed data
df$diff_shuffle1 <- abs(difftime(df$D, df$C, units = "days")) # A and B are at random times but I have added abs() as the diff time can be positive or negative
df$diff_shuffle2 <- abs(difftime(df$G, df$E, units = "days")) # A and B are at random times 2
mean(df$diff) # observed mean
mean(df$diff_shuffle1) # shuffled time difference between A and B is they happen at random times
mean(df$diff_shuffle2) # shuffled time difference between A and B is they happen at random times
CodePudding user response:
You can wrap what you've done in a for() loop for a given number of loops/simulations
nsims and track each simulation sim as it loops around and add the result each to the output. Note the static data name, and the dynamic df in the loop.
set.seed(100)
A <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 120)
B <- A 2 # as I am testing that B is dependent on A, so B always takes place after A
data <- data.frame(A, B)
nsims <- 100
sim <- 1
output <- data.frame()
for(i in 1:nsims){
df = transform(data, C = sample(A), D = sample(B), E = sample(A), G = sample(B) ) # to create two shuffled diff times
df$diff <- difftime(df$B, df$A) # observed data
df$diff_shuffle1 <- abs(difftime(df$D, df$C, units = "days")) # A and B are at random times but I have added abs() as the diff time can be positive or negative
df$diff_shuffle2 <- abs(difftime(df$G, df$E, units = "days")) # A and B are at random times 2
obsM <- mean(df$diff) # observed mean
shuf1M <- mean(df$diff_shuffle1) # shuffled time difference between A and B is they happen at random times
shuf2M <- mean(df$diff_shuffle2) # shuffled time difference between A and B is they happen at random times
out <- data.frame(obsM,shuf1M,shuf2M,sim)
output <- rbind(output,out)
sim <- sim 1
}
output
