Home > Blockchain >  How get the average from three different datasets in R
How get the average from three different datasets in R

Time:02-04

I've started using R 3 months ago, and working with information from weather stations. Now I want to get the average from three different datasets of the station located in the same place.

My data frames look like this :

d1

Date_Measure Id Tmin Tmax Tmedia Smin Smax Smedia Hrmin Hrmax Hrmedia
2020-06-21 06:00:00 M03100245 19.10 19.97 19.335 25.04 71.14 44.930 83.78 84.28 84.280
2020-06-21 07:00:00 M03100245 20.31 21.69 20.915 89.69 144.34 108.220 81.76 83.78 82.510
2020-06-21 08:00:00 M03100245 21.87 24.64 22.840 115.56 263.31 183.035 76.70 81.76 79.985

d2

Date_Measure Id Tmin Tmax Tmedia Smin Smax Smedia Hrmin Hrmax Hrmedia
2020-06-21 06:00:00 M03100247 19.43 20.20 19.560 17.46 68.28 37.880 91.69 92.72 92.720
2020-06-21 07:00:00 M03100247 20.64 21.81 21.050 87.55 139.98 105.535 87.07 91.18 88.865
2020-06-21 08:00:00 M03100247 21.85 24.75 23.015 112.42 266.65 183.440 75.24 86.55 82.695

d3

Date_Measure Id Tmin Tmax Tmedia Smin Smax Smedia Hrmin Hrmax Hrmedia
2020-06-21 06:00:00 M03100296 19.39 20.16 19.560 29.18 77.94 48.900 89.61 90.69 90.690
2020-06-21 07:00:00 M03100296 20.55 21.53 20.805 95.95 154.43 116.065 85.82 89.07 86.370
2020-06-21 08:00:00 M03100296 21.62 24.88 23.035 122.26 280.18 194.635 71.22 85.28 79.870

I want to take an average of each date and each hour for them in a single general table that combines the three, how can I do this process?

CodePudding user response:

You can combine the 3 dataframes, extract date and hour from Date_Measure variable and take average of numeric variables using across.

library(dplyr)
library(lubridate)

bind_rows(d1, d2, d3) %>%
  #Or instead of writing them individually, we can also use
  #bind_rows(mget(paste0('d', 1:3))) %>%
  mutate(Date_Measure = ymd_hms(Date_Measure), 
         Date = as.Date(Date_Measure),
         Hour = hour(Date_Measure)) %>%
  group_by(Date, Hour) %>%
  summarise(across(Tmin:Hrmedia, mean), .groups = 'drop')

#   Date        Hour  Tmin  Tmax Tmedia  Smin  Smax Smedia Hrmin Hrmax Hrmedia
#  <date>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
#1 2020-06-21     6  19.3  20.1   19.5  23.9  72.5   43.9  88.4  89.2    89.2
#2 2020-06-21     7  20.5  21.7   20.9  91.1 146.   110.   84.9  88.0    85.9
#3 2020-06-21     8  21.8  24.8   23.0 117.  270.   187.   74.4  84.5    80.8

CodePudding user response:

First make the number of dataset entries to be the same in all of them.

Then create another dataset by iterating over the column of date and hour.

I guess you need to generate the remaining columns by appropriate formulas.

  •  Tags:  
  • Related