I've started using R 3 months ago, and working with information from weather stations. Now I want to get the average from three different datasets of the station located in the same place.
My data frames look like this :
d1
| Date_Measure | Id | Tmin | Tmax | Tmedia | Smin | Smax | Smedia | Hrmin | Hrmax | Hrmedia |
|---|---|---|---|---|---|---|---|---|---|---|
| 2020-06-21 06:00:00 | M03100245 | 19.10 | 19.97 | 19.335 | 25.04 | 71.14 | 44.930 | 83.78 | 84.28 | 84.280 |
| 2020-06-21 07:00:00 | M03100245 | 20.31 | 21.69 | 20.915 | 89.69 | 144.34 | 108.220 | 81.76 | 83.78 | 82.510 |
| 2020-06-21 08:00:00 | M03100245 | 21.87 | 24.64 | 22.840 | 115.56 | 263.31 | 183.035 | 76.70 | 81.76 | 79.985 |
d2
| Date_Measure | Id | Tmin | Tmax | Tmedia | Smin | Smax | Smedia | Hrmin | Hrmax | Hrmedia |
|---|---|---|---|---|---|---|---|---|---|---|
| 2020-06-21 06:00:00 | M03100247 | 19.43 | 20.20 | 19.560 | 17.46 | 68.28 | 37.880 | 91.69 | 92.72 | 92.720 |
| 2020-06-21 07:00:00 | M03100247 | 20.64 | 21.81 | 21.050 | 87.55 | 139.98 | 105.535 | 87.07 | 91.18 | 88.865 |
| 2020-06-21 08:00:00 | M03100247 | 21.85 | 24.75 | 23.015 | 112.42 | 266.65 | 183.440 | 75.24 | 86.55 | 82.695 |
d3
| Date_Measure | Id | Tmin | Tmax | Tmedia | Smin | Smax | Smedia | Hrmin | Hrmax | Hrmedia |
|---|---|---|---|---|---|---|---|---|---|---|
| 2020-06-21 06:00:00 | M03100296 | 19.39 | 20.16 | 19.560 | 29.18 | 77.94 | 48.900 | 89.61 | 90.69 | 90.690 |
| 2020-06-21 07:00:00 | M03100296 | 20.55 | 21.53 | 20.805 | 95.95 | 154.43 | 116.065 | 85.82 | 89.07 | 86.370 |
| 2020-06-21 08:00:00 | M03100296 | 21.62 | 24.88 | 23.035 | 122.26 | 280.18 | 194.635 | 71.22 | 85.28 | 79.870 |
I want to take an average of each date and each hour for them in a single general table that combines the three, how can I do this process?
CodePudding user response:
You can combine the 3 dataframes, extract date and hour from Date_Measure variable and take average of numeric variables using across.
library(dplyr)
library(lubridate)
bind_rows(d1, d2, d3) %>%
#Or instead of writing them individually, we can also use
#bind_rows(mget(paste0('d', 1:3))) %>%
mutate(Date_Measure = ymd_hms(Date_Measure),
Date = as.Date(Date_Measure),
Hour = hour(Date_Measure)) %>%
group_by(Date, Hour) %>%
summarise(across(Tmin:Hrmedia, mean), .groups = 'drop')
# Date Hour Tmin Tmax Tmedia Smin Smax Smedia Hrmin Hrmax Hrmedia
# <date> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2020-06-21 6 19.3 20.1 19.5 23.9 72.5 43.9 88.4 89.2 89.2
#2 2020-06-21 7 20.5 21.7 20.9 91.1 146. 110. 84.9 88.0 85.9
#3 2020-06-21 8 21.8 24.8 23.0 117. 270. 187. 74.4 84.5 80.8
CodePudding user response:
First make the number of dataset entries to be the same in all of them.
Then create another dataset by iterating over the column of date and hour.
I guess you need to generate the remaining columns by appropriate formulas.
