I have 2 data frames with the same column names, but different numbers of rows. The first data frame (a) looks similar to this:
a = data.frame("Site"=c(1,2,3,4,7,9,10,11,13,14),
"v1"=c(0,0,0,0,0,0,0,0,0,0),
"v2"=c(0,0,0,0,NA,NA,NA,0,0,0),
"v3"=c(0,0,0,NA,0,NA,0,0,0,0),
"v4"=c(0,0,0,0,0,0,0,0,NA,NA),
"v5"=c(0,0,0,0,0,NA,0,NA,0,0))
Note: sites 5, 6, 8, and 12 are missing purposefully.
The second data frame (b) looks something like this:
b = data.frame("Site"=c(2,3,4,7,10,14),
"v1"=c(1,NA,2,1,NA,NA),
"v2"=c(1,1,NA,NA,NA,NA),
"v3"=c(NA,1,NA,NA,NA,1),
"v4"=c(1,NA,4,1,NA,NA),
"v5"=c(1,NA,2,1,1,3))
What I want to achieve is this:
desired = data.frame("Site"=c(1,2,3,4,7,9,10,11,13,14),
"v1"=c(0,1,0,2,1,0,0,0,0,0),
"v2"=c(0,1,1,0,NA,NA,NA,0,0,0),
"v3"=c(0,0,1,NA,0,NA,0,0,0,1),
"v4"=c(0,1,0,4,1,0,0,0,NA,NA),
"v5"=c(0,1,0,2,1,NA,1,NA,0,3))
Where I "inject" (I'm sure there's a better term) the data from data frame b into data frame a, however I'd like to replace any NAs from b with zeros and keep the NAs from a as they are.
I found and have tried this code:
cols <- colnames(a)[colnames(a) %in% colnames(b)]
rows <- rownames(a)[rownames(a) %in% rownames(b)]
a[rows, cols] <- b[rows, cols]
But it brings the NAs along with it. I considered replacing the NAs with zeros first, but even then it would erase the NAs I currently have in data frame a that I want to keep.
Perhaps a for loop or something in tidyverse is the way to go, but I don't even know where to begin with those. Any help would be much appreciated!
CodePudding user response:
I suggest you first replace every NA values in b with 0 and then use inner_join to merge the result with the corresponding Site values in a. You could then replace non-NA values of a with their corresponding values in b leaving NA values in a intact. In the end we bind the modified data frame with a subset of a whose Site values are not present in b.
library(dplyr)
a %>%
inner_join(b %>%
mutate(across(!Site, ~ replace(.x, is.na(.x), 0))),
by = "Site") %>%
mutate(across(ends_with(".x"), ~ ifelse(!is.na(.x), get(gsub("(.*\\.)x", "\\1y", cur_column())),
.x))) %>%
select(!ends_with("y")) %>%
rename_with(~ gsub("(.*)\\.x", "\\1", .), ends_with(".x")) %>%
bind_rows(a %>%
filter(!Site %in% unique(b$Site))) %>%
arrange(Site)
Site v1 v2 v3 v4 v5
1 1 0 0 0 0 0
2 2 1 1 0 1 1
3 3 0 1 1 0 0
4 4 2 0 NA 4 2
5 7 1 NA 0 1 1
6 9 0 NA NA 0 NA
7 10 0 NA 0 0 1
8 11 0 0 0 0 NA
9 13 0 0 0 NA 0
10 14 0 0 1 NA 3
CodePudding user response:
merge(b, a, by = 'Site', all = TRUE) %>%
split.default(sub('.x|.y', '', names(.))) %>%
map_df(~coalesce(!!!.x))
# A tibble: 10 x 6
Site v1 v2 v3 v4 v5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0 0
2 2 1 1 0 1 1
3 3 0 1 1 0 0
4 4 2 0 NA 4 2
5 7 1 NA 0 1 1
6 9 0 NA NA 0 NA
7 10 0 NA 0 0 1
8 11 0 0 0 0 NA
9 13 0 0 0 NA 0
10 14 0 0 1 NA 3
