Home > Mobile >  How to find the average from a fixed date range for multiple years in R
How to find the average from a fixed date range for multiple years in R

Time:01-25

My data set has flow rate measurements of a river for every day of the year from 1967 to 2022. I would like to find the average flow rate for yearly periods starting and ending in April (e.g. average flow rate from April 1967 - April 1969, April 1968 - April 1969, April 1969 - April 1970, etc).

Here is a sample of my data:

   A tibble: 20,100 x 7
   river year  season month   date       flow_rate quality
   <chr> <fct> <fct>  <fct>   <date>         <dbl> <chr>  
 1 wylye 1967  Winter January 1967-01-01      6.67 Good   
 2 wylye 1967  Winter January 1967-01-02      6.39 Good   
 3 wylye 1967  Winter January 1967-01-03      6.32 Good   
 4 wylye 1967  Winter January 1967-01-04      6.34 Good   
 5 wylye 1967  Winter January 1967-01-05      6.37 Good   
 6 wylye 1967  Winter January 1967-01-06      6.45 Good   
 7 wylye 1967  Winter January 1967-01-07      6.65 Good   
 8 wylye 1967  Winter January 1967-01-08      6.54 Good   
 9 wylye 1967  Winter January 1967-01-09      6.53 Good   
10 wylye 1967  Winter January 1967-01-10      6.62 Good   
# ... with 20,090 more rows

I have seen codes where people have found the average for certain months in the same year (July-October)

e.g.

df %>%
  mutate(date = as.Date(date), 
         day = day(date), 
         month = month(date), 
         year = year(date)) %>%
  filter(between(month, 7, 10) | 
         day >= 7  & month == 6 | 
         day <= 9 & month == 11) %>%
  group_by(year) %>%
  summarise(tmax = mean(tmax, na.rm = TRUE))

but not a code which allows me to look at the average of yearly periods of multiple years (April-April 1967-1968/1968-1969/1969-1970 etc). Any help would be appreciated :)

CodePudding user response:

Here is one way to do it with lubridate by shifting the year to match your interval

df<-tibble::tribble(
         ~date, ~flow_rate,
      "1967-01-01",      1,
      "1967-04-01",      2,
      "1968-01-01",      3,
      "1968-04-01",      4      )

library(dplyr)
library(lubridate)


df_new<-df %>%
  mutate(date=ymd(date),
         year_shift=year(date-days(90)),
         label=paste("April",year_shift,"-","March",year_shift 1)) %>%
  group_by(label) %>%
  summarize(flow_rate = mean(flow_rate)) %>%
  ungroup() 

df_new
#> # A tibble: 3 × 2
#>   label                   flow_rate
#>   <chr>                       <dbl>
#> 1 April 1966 - March 1967       1  
#> 2 April 1967 - March 1968       2.5
#> 3 April 1968 - March 1969       4

Created on 2022-01-25 by the reprex package (v2.0.1)

  •  Tags:  
  • Related