I am using R to calculate whole lake temperatures at every timestamp from the open water season.
I have loggers at various depths logging temperature every 10 minutes.
Each data frame for each lake has over 100k entries with over 10k different timestamps.
This is how I have solved this using a for loop. However, the code is extremely inefficient and it takes a couple of hours per lake depending how deep it is (deeper lakes have more loggers).
Example below resembles what my data look like. Running the script on the example goes fast, but takes hours on real data.
There should be a more effective way of doing this, with some apply-family function but idk how.
library(rLakeAnalyzer)
date <- c("2000-01-01 00:00:00","2000-01-01 00:00:00","2000-01-01 00:00:00",
"2000-01-01 00:10:00","2000-01-01 00:10:00","2000-01-01 00:10:00",
"2000-01-01 00:20:00","2000-01-01 00:20:00","2000-01-01 00:20:00")
depth <- c(1,2,3,1,2,3,1,2,3)
temp <- c(20,12,9,14,12,11,10,7,4)
dt <- as.data.frame(cbind(temp,depth,date)) #example data frame
dptd <- c(0,1,2,3) #example depth
dpta <- c(5000,2500,1250,625) #example area per depth
datelist <- levels(as.factor(dt$date)) #'for each date in the frame...'
ldf <- list() #list to store every row for the new data frame
for(i in 1:length(datelist)){
print(i) #to check how fast it operates
lek <- dt[grepl(datelist[i],dt$date),] #take every date in dt
temp <- whole.lake.temperature(wtr=lek$temp,depths=lek$depth,bthA=dpta,bthD=dptd) #function
date <- datelist[i]
ldf[[i]] <- as.data.frame(cbind(temp,date)) #make a dataframe in list with 1 row and 2 col
}
ldf <- bind_rows(ldf) #convert list of data frames to a complete data frame
ldf$temp <- as.numeric(ldf$temp)
ldf$date <- as.POSIXct(ldf$date)
plot(ldf$date,ldf$temp) #woala, I have a dataframe with the whole lake temp at every timestamp
CodePudding user response:
How about using data.table, grouping by date, and then applying the whole.lake.temperature function:
library(rLakeAnalyzer)
library(data.table)
date <- c("2000-01-01 00:00:00","2000-01-01 00:00:00","2000-01-01 00:00:00",
"2000-01-01 00:10:00","2000-01-01 00:10:00","2000-01-01 00:10:00",
"2000-01-01 00:20:00","2000-01-01 00:20:00","2000-01-01 00:20:00")
depth <- c(1,2,3,1,2,3,1,2,3)
temp <- c(20,12,9,14,12,11,10,7,4)
dt <- as.data.frame(cbind(temp,depth,date)) #example data frame
dptd <- c(0,1,2,3) #example depth
dpta <- c(5000,2500,1250,625) #example area per depth
results <- setDT(dt)[,by=date,
.(temp=whole.lake.temperature(wtr=temp,
depths=depth,
bthA=dpta,
bthD=dptd))]
It's hard to tell if it speeds things up without trying it out on your whole dataset. Let me know if it helps.
