Home > Mobile >  How to fill one data frame with data from another while retaining NAs from the first
How to fill one data frame with data from another while retaining NAs from the first

Time:01-04

I have 2 data frames with the same column names, but different numbers of rows. The first data frame (a) looks similar to this:

a = data.frame("Site"=c(1,2,3,4,7,9,10,11,13,14), 
               "v1"=c(0,0,0,0,0,0,0,0,0,0), 
               "v2"=c(0,0,0,0,NA,NA,NA,0,0,0), 
               "v3"=c(0,0,0,NA,0,NA,0,0,0,0), 
               "v4"=c(0,0,0,0,0,0,0,0,NA,NA), 
               "v5"=c(0,0,0,0,0,NA,0,NA,0,0)) 

Note: sites 5, 6, 8, and 12 are missing purposefully.

The second data frame (b) looks something like this:

b = data.frame("Site"=c(2,3,4,7,10,14),
               "v1"=c(1,NA,2,1,NA,NA),
               "v2"=c(1,1,NA,NA,NA,NA),
               "v3"=c(NA,1,NA,NA,NA,1),
               "v4"=c(1,NA,4,1,NA,NA),
               "v5"=c(1,NA,2,1,1,3))

What I want to achieve is this:

desired = data.frame("Site"=c(1,2,3,4,7,9,10,11,13,14), 
                     "v1"=c(0,1,0,2,1,0,0,0,0,0), 
                     "v2"=c(0,1,1,0,NA,NA,NA,0,0,0), 
                     "v3"=c(0,0,1,NA,0,NA,0,0,0,1), 
                     "v4"=c(0,1,0,4,1,0,0,0,NA,NA), 
                     "v5"=c(0,1,0,2,1,NA,1,NA,0,3))

Where I "inject" (I'm sure there's a better term) the data from data frame b into data frame a, however I'd like to replace any NAs from b with zeros and keep the NAs from a as they are.

I found and have tried this code:

cols <- colnames(a)[colnames(a) %in% colnames(b)]
rows <- rownames(a)[rownames(a) %in% rownames(b)]

a[rows, cols] <- b[rows, cols]

But it brings the NAs along with it. I considered replacing the NAs with zeros first, but even then it would erase the NAs I currently have in data frame a that I want to keep.

Perhaps a for loop or something in tidyverse is the way to go, but I don't even know where to begin with those. Any help would be much appreciated!

CodePudding user response:

I suggest you first replace every NA values in b with 0 and then use inner_join to merge the result with the corresponding Site values in a. You could then replace non-NA values of a with their corresponding values in b leaving NA values in a intact. In the end we bind the modified data frame with a subset of a whose Site values are not present in b.

library(dplyr)

a %>%
  inner_join(b %>%
               mutate(across(!Site, ~ replace(.x, is.na(.x), 0))), 
             by = "Site") %>%
  mutate(across(ends_with(".x"), ~ ifelse(!is.na(.x), get(gsub("(.*\\.)x", "\\1y", cur_column())), 
                                          .x))) %>%
  select(!ends_with("y")) %>%
  rename_with(~ gsub("(.*)\\.x", "\\1", .), ends_with(".x")) %>%
  bind_rows(a %>% 
              filter(!Site %in% unique(b$Site))) %>%
  arrange(Site)


   Site v1 v2 v3 v4 v5
1     1  0  0  0  0  0
2     2  1  1  0  1  1
3     3  0  1  1  0  0
4     4  2  0 NA  4  2
5     7  1 NA  0  1  1
6     9  0 NA NA  0 NA
7    10  0 NA  0  0  1
8    11  0  0  0  0 NA
9    13  0  0  0 NA  0
10   14  0  0  1 NA  3

CodePudding user response:

merge(b, a, by = 'Site', all = TRUE) %>%
  split.default(sub('.x|.y', '', names(.))) %>%
  map_df(~coalesce(!!!.x))

# A tibble: 10 x 6
    Site    v1    v2    v3    v4    v5
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     0     0     0     0     0
 2     2     1     1     0     1     1
 3     3     0     1     1     0     0
 4     4     2     0    NA     4     2
 5     7     1    NA     0     1     1
 6     9     0    NA    NA     0    NA
 7    10     0    NA     0     0     1
 8    11     0     0     0     0    NA
 9    13     0     0     0    NA     0
10    14     0     0     1    NA     3
  •  Tags:  
  • Related