I have a dataset (data.weather) with one weather variable (TMAX) for two locations (combination of LAT and LON) and two years. TMAX is available for ten days per year and location in this mock example. I need to calculate the mean TMAX (mean_TMAX) for each of the four rows in data.locs. This last dataset indicate the range of date for which I need to calculate the mean. That is between DATE_0 and DATE_1.
Here is the code of what I did:
library(dplyr)
library(lubridate)
data.weather <-read.csv(text = "
LAT,LON,YEAR,DATE,TMAX
36,-89,2010,1/1/2010,25
36,-89,2010,1/2/2010,25
36,-89,2010,1/3/2010,25
36,-89,2010,1/4/2010,28
36,-89,2010,1/5/2010,28
36,-89,2010,1/6/2010,29
36,-89,2010,1/7/2010,25
36,-89,2010,1/8/2010,25
36,-89,2010,1/9/2010,25
36,-89,2010,1/10/2010,28
36,-89,2011,1/1/2011,26
36,-89,2011,1/2/2011,25
36,-89,2011,1/3/2011,28
36,-89,2011,1/4/2011,26
36,-89,2011,1/5/2011,27
36,-89,2011,1/6/2011,27
36,-89,2011,1/7/2011,28
36,-89,2011,1/8/2011,29
36,-89,2011,1/9/2011,27
36,-89,2011,1/10/2011,26
40,-96,2010,1/1/2010,29
40,-96,2010,1/2/2010,28
40,-96,2010,1/3/2010,25
40,-96,2010,1/4/2010,25
40,-96,2010,1/5/2010,28
40,-96,2010,1/6/2010,29
40,-96,2010,1/7/2010,26
40,-96,2010,1/8/2010,28
40,-96,2010,1/9/2010,26
40,-96,2010,1/10/2010,25
40,-96,2011,1/1/2011,29
40,-96,2011,1/2/2011,27
40,-96,2011,1/3/2011,29
40,-96,2011,1/4/2011,25
40,-96,2011,1/5/2011,28
40,-96,2011,1/6/2011,29
40,-96,2011,1/7/2011,29
40,-96,2011,1/8/2011,25
40,-96,2011,1/9/2011,25
40,-96,2011,1/10/2011,26
") %>%
mutate(DATE = as.Date(DATE, format = "%m/%d/%Y"))
data.locs <-read.csv(text = "
LAT,LON,YEAR,DATE_0,DATE_1,GEN,PR
36,-89,2010,1/2/2010,1/9/2010,MN103,35
36,-89,2011,1/1/2011,1/10/2011,IA100,33
40,-96,2010,1/4/2010,1/8/2010,MN103,36
40,-96,2011,1/2/2011,1/6/2011,IA100,34
") %>%
mutate(DATE_0 = as.Date(DATE_0, format = "%m/%d/%Y"),
DATE_1 = as.Date(DATE_1, format = "%m/%d/%Y"))
tmax.calculation <- data.locs %>%
group_by(LAT,LON,YEAR, GEN) %>%
mutate(mean_TMAX = mean(data.weather$TMAX[data.weather$DATE %within% interval(DATE_0, DATE_1)]))
This is the expected result:
LAT LON YEAR DATE_0 DATE_1 GEN PR meam_tmax
36 -89 2010 1/2/2010 1/9/2010 MN103 35 26.25
36 -89 2011 1/1/2011 1/10/2011 IA100 33 26.90
40 -96 2010 1/4/2010 1/8/2010 MN103 36 27.20
40 -96 2011 1/2/2011 1/6/2011 IA100 34 27.60
However, this is what I am getting:
LAT LON YEAR DATE_0 DATE_1 GEN PR meam_tmax
36 -89 2010 1/2/2010 1/9/2010 MN103 35 26.5625
36 -89 2011 1/1/2011 1/10/2011 IA100 33 27.0500
40 -96 2010 1/4/2010 1/8/2010 MN103 36 27.1000
40 -96 2011 1/2/2011 1/6/2011 IA100 34 27.1000
The problem I have is that, when reading the data interval in data.weather, the calculation is being made over the correct interval BUT across the two locations (combination of LAT and LON). I couldn't find a way to indicate to calculate the mean only for each LAT and LON combination separately.
CodePudding user response:
This should do it:
library(dplyr)
library(lubridate)
data.weather <-read.csv(text = "
LAT,LON,YEAR,DATE,TMAX
36,-89,2010,1/1/2010,25
36,-89,2010,1/2/2010,25
36,-89,2010,1/3/2010,25
36,-89,2010,1/4/2010,28
36,-89,2010,1/5/2010,28
36,-89,2010,1/6/2010,29
36,-89,2010,1/7/2010,25
36,-89,2010,1/8/2010,25
36,-89,2010,1/9/2010,25
36,-89,2010,1/10/2010,28
36,-89,2011,1/1/2011,26
36,-89,2011,1/2/2011,25
36,-89,2011,1/3/2011,28
36,-89,2011,1/4/2011,26
36,-89,2011,1/5/2011,27
36,-89,2011,1/6/2011,27
36,-89,2011,1/7/2011,28
36,-89,2011,1/8/2011,29
36,-89,2011,1/9/2011,27
36,-89,2011,1/10/2011,26
40,-96,2010,1/1/2010,29
40,-96,2010,1/2/2010,28
40,-96,2010,1/3/2010,25
40,-96,2010,1/4/2010,25
40,-96,2010,1/5/2010,28
40,-96,2010,1/6/2010,29
40,-96,2010,1/7/2010,26
40,-96,2010,1/8/2010,28
40,-96,2010,1/9/2010,26
40,-96,2010,1/10/2010,25
40,-96,2011,1/1/2011,29
40,-96,2011,1/2/2011,27
40,-96,2011,1/3/2011,29
40,-96,2011,1/4/2011,25
40,-96,2011,1/5/2011,28
40,-96,2011,1/6/2011,29
40,-96,2011,1/7/2011,29
40,-96,2011,1/8/2011,25
40,-96,2011,1/9/2011,25
40,-96,2011,1/10/2011,26
") %>%
mutate(DATE = as.Date(DATE, format = "%m/%d/%Y"))
data.locs <-read.csv(text = "
LAT,LON,YEAR,DATE_0,DATE_1,GEN,PR
36,-89,2010,1/2/2010,1/9/2010,MN103,35
36,-89,2011,1/1/2011,1/10/2011,IA100,33
40,-96,2010,1/4/2010,1/8/2010,MN103,36
40,-96,2011,1/2/2011,1/6/2011,IA100,34
") %>%
mutate(DATE_0 = as.Date(DATE_0, format = "%m/%d/%Y"),
DATE_1 = as.Date(DATE_1, format = "%m/%d/%Y"))
tmax.calculation <- data.locs %>%
group_by(LAT,LON,YEAR,GEN) %>%
do(data.frame(LAT=.$LAT,
LON=.$LON,
YEAR=.$YEAR,
GEN=.$GEN,
DATE=seq(.$DATE_0, .$DATE_1, by="days"))) %>%
left_join(data.weather, by=c("LAT", "LON", "YEAR", "DATE")) %>%
summarise(mean_TMAX = mean(TMAX))
Result:

