Home > Blockchain >  My for loop operating on each timestamp in high frequency data is ineffective
My for loop operating on each timestamp in high frequency data is ineffective

Time:02-04

I am using R to calculate whole lake temperatures at every timestamp from the open water season.

I have loggers at various depths logging temperature every 10 minutes.

Each data frame for each lake has over 100k entries with over 10k different timestamps.

This is how I have solved this using a for loop. However, the code is extremely inefficient and it takes a couple of hours per lake depending how deep it is (deeper lakes have more loggers).

Example below resembles what my data look like. Running the script on the example goes fast, but takes hours on real data.

There should be a more effective way of doing this, with some apply-family function but idk how.

    library(rLakeAnalyzer)
    
date <- c("2000-01-01 00:00:00","2000-01-01 00:00:00","2000-01-01 00:00:00",
          "2000-01-01 00:10:00","2000-01-01 00:10:00","2000-01-01 00:10:00",
          "2000-01-01 00:20:00","2000-01-01 00:20:00","2000-01-01 00:20:00")
depth <- c(1,2,3,1,2,3,1,2,3)
temp <- c(20,12,9,14,12,11,10,7,4)

dt <- as.data.frame(cbind(temp,depth,date)) #example data frame

dptd <- c(0,1,2,3) #example depth
dpta <- c(5000,2500,1250,625) #example area per depth

datelist <- levels(as.factor(dt$date)) #'for each date in the frame...'

ldf <- list() #list to store every row for the new data frame
for(i in 1:length(datelist)){
  print(i) #to check how fast it operates
  lek <- dt[grepl(datelist[i],dt$date),] #take every date in dt
  temp <- whole.lake.temperature(wtr=lek$temp,depths=lek$depth,bthA=dpta,bthD=dptd) #function 
  date <- datelist[i] 
  ldf[[i]] <- as.data.frame(cbind(temp,date)) #make a dataframe in list with 1 row and 2 col
}

ldf <- bind_rows(ldf) #convert list of data frames to a complete data frame
ldf$temp <- as.numeric(ldf$temp)
ldf$date <- as.POSIXct(ldf$date)

plot(ldf$date,ldf$temp) #woala, I have a dataframe with the whole lake temp at every timestamp   

CodePudding user response:

How about using data.table, grouping by date, and then applying the whole.lake.temperature function:

library(rLakeAnalyzer)
library(data.table)
date <- c("2000-01-01 00:00:00","2000-01-01 00:00:00","2000-01-01 00:00:00",
      "2000-01-01 00:10:00","2000-01-01 00:10:00","2000-01-01 00:10:00",
      "2000-01-01 00:20:00","2000-01-01 00:20:00","2000-01-01 00:20:00")
depth <- c(1,2,3,1,2,3,1,2,3)
temp <- c(20,12,9,14,12,11,10,7,4)

dt <- as.data.frame(cbind(temp,depth,date)) #example data frame

dptd <- c(0,1,2,3) #example depth
dpta <- c(5000,2500,1250,625) #example area per depth

results <- setDT(dt)[,by=date,
                     .(temp=whole.lake.temperature(wtr=temp,
                                                   depths=depth,
                                                   bthA=dpta,
                                                   bthD=dptd))]

It's hard to tell if it speeds things up without trying it out on your whole dataset. Let me know if it helps.

  •  Tags:  
  • Related