I have two dataframes. One is quality with different dates throughout the year of collection points at station st. The other is the land use of each st season for that year.
Example
library(dplyr)
water_df=read.table(text="st Date OD pH DBO A 01/07/2005 8 6.3 3 A 02/06/2005 7 6.2 2.2 A 01/01/2005 7.3 6.5 3.1 A 03/05/2006 6 6.3 4 A 09/08/2006 6.8 7.1 1.1 A 12/12/2006 7.3 8.1 2.9 B 02/07/2005 6.8 5.4 2.6 B 03/06/2005 6.0 5.3 1.9 B 02/01/2005 6.2 5.5 2.6 B 04/05/2006 5.1 5.4 3.4 B 10/08/2006 5.8 6.0 0.9 B 13/12/2006 6.2 6.9 2.5 C 20/12/2006 6.5 7.2 2.6 C 27/12/2006 6.8 7.6 2.7 C 03/01/2007 7.2 8.0 2.9 C 10/01/2007 7.5 8.4 3.0 C 17/01/2007 7.9 8.8 3.1 C 24/01/2007 8.3 9.2 3.3 C 31/01/2007 8.7 9.7 3.5 C 07/02/2007 9.2 10.2 3.6", sep="", header=TRUE)%>%as.data.frame() land_df=read.table(text = "st year Veg Water Soil Crop Grass A 2005 100 200 80 130 70 B 2006 98 180 84 132 86 C 2007 93 175 79 127 106", sep="", header = TRUE)%>%as.data.frame()
I would like to add to the quality data.frame the land use values of that station even if for the same station st is repeated.
I tried some things but it didn't work
#I tryed cbind
join_df<-cbind(water_df, land_df)
#I tryed
library(purrr)
joind_df2<-purrr::reduce(water_df, land_df)
CodePudding user response:
This operation it commonly called a "join" or "merge". You need columns to join on, which means we need to extract the year from your date, and then it's a left_join command. See this FAQ for more information about joining data in R.
library(dplyr)
library(lubridate)
water_df %>%
mutate(year = year(dmy(Date))) %>%
left_join(land_df, by = c("st", "year"))
# st Date OD pH DBO year Veg Water Soil Crop Grass
# 1 A 01/07/2005 8.0 6.3 3.0 2005 100 200 80 130 70
# 2 A 02/06/2005 7.0 6.2 2.2 2005 100 200 80 130 70
# 3 A 01/01/2005 7.3 6.5 3.1 2005 100 200 80 130 70
# 4 A 03/05/2006 6.0 6.3 4.0 2006 NA NA NA NA NA
# 5 A 09/08/2006 6.8 7.1 1.1 2006 NA NA NA NA NA
# 6 A 12/12/2006 7.3 8.1 2.9 2006 NA NA NA NA NA
# 7 B 02/07/2005 6.8 5.4 2.6 2005 NA NA NA NA NA
# 8 B 03/06/2005 6.0 5.3 1.9 2005 NA NA NA NA NA
# 9 B 02/01/2005 6.2 5.5 2.6 2005 NA NA NA NA NA
# 10 B 04/05/2006 5.1 5.4 3.4 2006 98 180 84 132 86
# 11 B 10/08/2006 5.8 6.0 0.9 2006 98 180 84 132 86
# 12 B 13/12/2006 6.2 6.9 2.5 2006 98 180 84 132 86
# 13 C 20/12/2006 6.5 7.2 2.6 2006 NA NA NA NA NA
# 14 C 27/12/2006 6.8 7.6 2.7 2006 NA NA NA NA NA
# 15 C 03/01/2007 7.2 8.0 2.9 2007 93 175 79 127 106
# 16 C 10/01/2007 7.5 8.4 3.0 2007 93 175 79 127 106
# 17 C 17/01/2007 7.9 8.8 3.1 2007 93 175 79 127 106
# 18 C 24/01/2007 8.3 9.2 3.3 2007 93 175 79 127 106
# 19 C 31/01/2007 8.7 9.7 3.5 2007 93 175 79 127 106
# 20 C 07/02/2007 9.2 10.2 3.6 2007 93 175 79 127 106
There's some missing values where your land_df didn't have observations for a particular station in a particular year. Have a look at ?tidyr::fill and this FAQ if you want to fill those in with, e.g., the previous observation.

