I have two different data frames 'df1' and 'df2' with six matching column names. I want to scan df2 for rows that match exactly in df1, and if they do enter a 1 in the 'detect' column of df1 and if not enter a 0 in that column. Currently all values of 'detect' in df1 are 0's, but I want those to change to 1 when there's an exact match between the two data frames. It would look like this:
df1
| site | ddate | ssegment | spp | vtype | tperiod | detect |
|---|---|---|---|---|---|---|
| BMA | 6/1/2021 | 1 | AMRO | Song | 1 | 0 |
| BMC | 6/15/2021 | 1 | WISN | Drum | 1 | 0 |
| BMA | 6/15/2021 | 1 | NOFL | Song | 2 | 0 |
| BMC | 6/29/2021 | 2 | AMRO | Call | 1 | 0 |
| BMA | 6/29/2021 | 2 | WISN | Call | 2 | 0 |
df2
| site | ddate | ssegment | spp | vtype | tperiod |
|---|---|---|---|---|---|
| BMA | 6/1/2021 | 1 | AMRO | Call | 1 |
| BMC | 6/15/2021 | 1 | WISN | Drum | 1 |
| BMA | 6/15/2021 | 1 | NOFL | Song | 2 |
| BMC | 6/29/2021 | 2 | AMRO | Drum | 1 |
| BMA | 6/29/2021 | 2 | WISN | Call | 2 |
After scanning these, df1 would now look like:
df1
| site | ddate | ssegment | spp | vtype | tperiod | detect |
|---|---|---|---|---|---|---|
| BMA | 6/1/2021 | 1 | AMRO | Song | 1 | 0 |
| BMC | 6/15/2021 | 1 | WISN | Drum | 1 | 1 |
| BMA | 6/15/2021 | 1 | NOFL | Song | 2 | 1 |
| BMC | 6/29/2021 | 2 | AMRO | Call | 1 | 0 |
| BMA | 6/29/2021 | 2 | WISN | Call | 2 | 1 |
I was thinking that R base function 'merge' might be useful, but I can't quite figure it out. Thank you for your help!
CodePudding user response:
Start with the detect column only in df2, then merge:
df1$detect = NULL
df2$detect = 1
result = merge(df1, unique(df2), all.x = TRUE)
This will create the detect column as 1s when there are exact matches and NAs when there are not. If you want, you can change the NAs to 0s.
The same method can work with dplyr:
library(dplyr)
df1 %>%
select(-detect) %>%
left_join(
df2 %>% mutate(detect = 1) %>% unique)
)
CodePudding user response:
There is anti_join and semi_join for filter joining of two tables:
library(tidyverse)
df1 <- tribble(
~site, ~ddate, ~ssegment, ~spp, ~vtype, ~tperiod, ~detect,
"BMA", "6/1/2021", 1L, "AMRO", "Song", 1L, 0L,
"BMC", "6/15/2021", 1L, "WISN", "Drum", 1L, 0L,
"BMA", "6/15/2021", 1L, "NOFL", "Song", 2L, 0L,
"BMC", "6/29/2021", 2L, "AMRO", "Call", 1L, 0L,
"BMA", "6/29/2021", 2L, "WISN", "Call", 2L, 0L
)
df2 <- tibble::tribble(
~site, ~ddate, ~ssegment, ~spp, ~vtype, ~tperiod,
"BMA", "6/1/2021", 1L, "AMRO", "Call", 1L,
"BMC", "6/15/2021", 1L, "WISN", "Drum", 1L,
"BMA", "6/15/2021", 1L, "NOFL", "Song", 2L,
"BMC", "6/29/2021", 2L, "AMRO", "Drum", 1L,
"BMA", "6/29/2021", 2L, "WISN", "Call", 2L
)
bind_rows(
df1 %>% select(-detect) %>% anti_join(df2) %>% mutate(detect = 0),
df1 %>% select(-detect) %>% semi_join(df2) %>% mutate(detect = 1)
)
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> Joining, by = c("site", "ddate", "ssegment", "spp", "vtype", "tperiod")
#> # A tibble: 5 x 7
#> site ddate ssegment spp vtype tperiod detect
#> <chr> <chr> <int> <chr> <chr> <int> <dbl>
#> 1 BMA 6/1/2021 1 AMRO Song 1 0
#> 2 BMC 6/29/2021 2 AMRO Call 1 0
#> 3 BMC 6/15/2021 1 WISN Drum 1 1
#> 4 BMA 6/15/2021 1 NOFL Song 2 1
#> 5 BMA 6/29/2021 2 WISN Call 2 1
Created on 2021-12-08 by the reprex package (v2.0.1)
CodePudding user response:
Please find one possible and very simple solution with the data.table library
Reprex
- Code
library(data.table)
setDT(df1)
setDT(df2)
df1[df2, on = .(site, ddate, ssegment, spp, vtype, tperiod), detect := TRUE][]
- Output
#> site ddate ssegment spp vtype tperiod detect
#> 1: BMA 6/1/2021 1 AMRO Song 1 0
#> 2: BMC 6/15/2021 1 WISN Drum 1 1
#> 3: BMA 6/15/2021 1 NOFL Song 2 1
#> 4: BMC 6/29/2021 2 AMRO Call 1 0
#> 5: BMA 6/29/2021 2 WISN Call 2 1
Created on 2021-12-08 by the reprex package (v2.0.1)
