How to fill one data frame with data from another while retaining NAs from the first-CodePudding

I have 2 data frames with the same column names, but different numbers of rows. The first data frame (a) looks similar to this:

a = data.frame("Site"=c(1,2,3,4,7,9,10,11,13,14), 
               "v1"=c(0,0,0,0,0,0,0,0,0,0), 
               "v2"=c(0,0,0,0,NA,NA,NA,0,0,0), 
               "v3"=c(0,0,0,NA,0,NA,0,0,0,0), 
               "v4"=c(0,0,0,0,0,0,0,0,NA,NA), 
               "v5"=c(0,0,0,0,0,NA,0,NA,0,0))

Note: sites 5, 6, 8, and 12 are missing purposefully.

The second data frame (b) looks something like this:

b = data.frame("Site"=c(2,3,4,7,10,14),
               "v1"=c(1,NA,2,1,NA,NA),
               "v2"=c(1,1,NA,NA,NA,NA),
               "v3"=c(NA,1,NA,NA,NA,1),
               "v4"=c(1,NA,4,1,NA,NA),
               "v5"=c(1,NA,2,1,1,3))

What I want to achieve is this:

desired = data.frame("Site"=c(1,2,3,4,7,9,10,11,13,14), 
                     "v1"=c(0,1,0,2,1,0,0,0,0,0), 
                     "v2"=c(0,1,1,0,NA,NA,NA,0,0,0), 
                     "v3"=c(0,0,1,NA,0,NA,0,0,0,1), 
                     "v4"=c(0,1,0,4,1,0,0,0,NA,NA), 
                     "v5"=c(0,1,0,2,1,NA,1,NA,0,3))

Where I "inject" (I'm sure there's a better term) the data from data frame b into data frame a, however I'd like to replace any NAs from b with zeros and keep the NAs from a as they are.

I found and have tried this code:

cols <- colnames(a)[colnames(a) %in% colnames(b)]
rows <- rownames(a)[rownames(a) %in% rownames(b)]

a[rows, cols] <- b[rows, cols]

But it brings the NAs along with it. I considered replacing the NAs with zeros first, but even then it would erase the NAs I currently have in data frame a that I want to keep.

Perhaps a for loop or something in tidyverse is the way to go, but I don't even know where to begin with those. Any help would be much appreciated!

CodePudding user response：

I suggest you first replace every NA values in b with 0 and then use inner_join to merge the result with the corresponding Site values in a. You could then replace non-NA values of a with their corresponding values in b leaving NA values in a intact. In the end we bind the modified data frame with a subset of a whose Site values are not present in b.

library(dplyr)

a %>%
  inner_join(b %>%
               mutate(across(!Site, ~ replace(.x, is.na(.x), 0))), 
             by = "Site") %>%
  mutate(across(ends_with(".x"), ~ ifelse(!is.na(.x), get(gsub("(.*\\.)x", "\\1y", cur_column())), 
                                          .x))) %>%
  select(!ends_with("y")) %>%
  rename_with(~ gsub("(.*)\\.x", "\\1", .), ends_with(".x")) %>%
  bind_rows(a %>% 
              filter(!Site %in% unique(b$Site))) %>%
  arrange(Site)


   Site v1 v2 v3 v4 v5
1     1  0  0  0  0  0
2     2  1  1  0  1  1
3     3  0  1  1  0  0
4     4  2  0 NA  4  2
5     7  1 NA  0  1  1
6     9  0 NA NA  0 NA
7    10  0 NA  0  0  1
8    11  0  0  0  0 NA
9    13  0  0  0 NA  0
10   14  0  0  1 NA  3

CodePudding user response：

merge(b, a, by = 'Site', all = TRUE) %>%
  split.default(sub('.x|.y', '', names(.))) %>%
  map_df(~coalesce(!!!.x))

# A tibble: 10 x 6
    Site    v1    v2    v3    v4    v5
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     0     0     0     0     0
 2     2     1     1     0     1     1
 3     3     0     1     1     0     0
 4     4     2     0    NA     4     2
 5     7     1    NA     0     1     1
 6     9     0    NA    NA     0    NA
 7    10     0    NA     0     0     1
 8    11     0     0     0     0    NA
 9    13     0     0     0    NA     0
10    14     0     0     1    NA     3