Home > Mobile >  How to create a loop code from big dataframe in R?
How to create a loop code from big dataframe in R?

Time:01-16

I have a data series of daily snow depth values over a 60 year period. I would like to see the number of days with a snow depth higher than 30 cm for each season, for example from July 1980 to June 1981. What does the code for this have to look like? I know how I could calculate the daily values higher than 30 cm per season individually, but not how a code could calculate all seasons. I have uploaded my dataframe on wetransfer: Dataframe

Thank you so much for your help in advance. Pernilla

CodePudding user response:

Something like this would work

library(dplyr)
library(lubridate)

df<-read.csv('BayrischerWald_Brennes_SH_daily_merged.txt', sep=';')

df_season <-df %>%
  mutate(season=(Day %>% ymd() - days(181))  %>% floor_date("year")  %>% year()) 


df_group_by_season <- df_season %>%
  filter(!is.na(SHincm)) %>%
  group_by(season) %>%
  summarize(days_above_30=sum(SHincm>30)) %>%
  ungroup()

df_group_by_season

#> # A tibble: 61 × 2
#>    season days_above_30
#>     <dbl>         <int>
#>  1   1961             1
#>  2   1962             0
#>  3   1963             0
#>  4   1964             0
#>  5   1965             0
#>  6   1966             0
#>  7   1967           129
#>  8   1968            60
#>  9   1969           107
#> 10   1970            43
#> # … with 51 more rows

Created on 2022-01-15 by the reprex package (v2.0.1)

CodePudding user response:

Here is an approach using the aggregate() function. After reading the data, convert the Date field to a date object and get rid of the rows with missing values for the date:

snow <- read.table("BayrischerWald_Brennes_SH_daily_merged.txt", header=TRUE, sep=";")
snow$Day <- as.Date(snow$Day)
str(snow)
# 'data.frame': 51606 obs. of  2 variables:
#  $ Day   : Date, format: "1961-11-01" "1961-11-02" "1961-11-03" "1961-11-04" ...
#  $ SHincm: int  0 0 0 0 2 9 19 22 15 5 ...
snow <- snow[!is.na(snow$Day), ]
str(snow)
# 'data.frame': 21886 obs. of  2 variables:
#  $ Day   : Date, format: "1961-11-01" "1961-11-02" "1961-11-03" "1961-11-04" ...
#  $ SHincm: int  0 0 0 0 2 9 19 22 15 5 ...

Notice more than half of your data has missing values for the date. Now we need to divide the data by ski season:

brks <- as.Date(paste(1961:2022, "07-01", sep="-"))
lbls <- paste(1961:2021, 1962:2022, sep="/")
snow$Season <- cut(snow$Day, breaks=brks, labels=lbls)

Now we use aggregate() to get the number of days with over 30 inches of snow:

days30cm <- aggregate(SHincm~Season, snow, subset=snow$SHincm > 30, length)
colnames(days30cm)[2] <- "Over30cm"
head(days30cm, 10)
#       Season Over30cm
# 1  1961/1962        1
# 2  1967/1968      129
# 3  1968/1969       60
# 4  1969/1970      107
# 5  1970/1971       43
# 6  1972/1973      101
# 7  1973/1974      119
# 8  1974/1975      188
# 9  1975/1976      126
# 10 1976/1977      112

In addition, you can get other statistics such as the maximum snow of the season or the total inches of snow:

maxsnow <- aggregate(SHincm~Season, snow, max)
totalsnow <- aggregate(SHincm~Season, snow, sum)
  •  Tags:  
  • Related