Home > Software engineering >  R update a tibble using data from a second tibble as row and column
R update a tibble using data from a second tibble as row and column

Time:02-08

I have an empty tibble full of NA's with the rows and columns, respectively named for id numbers and all the dates in a period of time. For example, this code:

tbl <- tibble(PERSONAL_ID = c("A", "B", "C", "D"))
dates = as.character((seq(as.Date("2016-01-01"), as.Date("2016-01-05"), by="days")))
tbl[dates] <- NA
tbl <- column_to_rownames(tbl, var = "PERSONAL_ID")

I have a second tibble that contains columns matching up one ID number with one date, as in this example:

enrollments <- tibble(ID = c("D", "B", "C", "D"),
                      date = c("2016-01-01", "2016-01-03", "2016-01-05", "2016-01-02"))

What I would like to do is add "1" to the row and column of the first tibble (tbl) corresponding to the ID and date listed in the second tibble (enrollments). For the example code above, the desired output would be:

  2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
A       <NA>       <NA>       <NA>        <NA>      <NA>
B       <NA>       <NA>          1        <NA>      <NA>
C       <NA>       <NA>       <NA>        <NA>         1
D          1          1       <NA>        <NA>      <NA>

Thank you!

CodePudding user response:

Here's a tidyverse approach.

  1. First change your tbl from a wide format to a long format so that it matches the format of enrollments.
  2. Then left_join the transformed tbl with enrollments using ID and date as the joining field.
  3. Finally, transform the long format back to a wide format and set rownames.
library(tidyverse)

left_join(tbl %>% rownames_to_column(var = "ID") %>% 
            pivot_longer(-ID, names_to = "date", values_to = "Count") %>% 
            select(-Count), 
          enrollments %>% mutate(Count = 1), 
          by = c("ID", "date")) %>%
  pivot_wider(names_from = "date", values_from = "Count") %>%
  column_to_rownames(var = "ID")

Output

 2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
A         NA         NA         NA         NA         NA
B         NA         NA          1         NA         NA
C         NA         NA         NA         NA          1
D          1          1         NA         NA         NA

Your dataset as reference

tbl
  2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
A         NA         NA         NA         NA         NA
B         NA         NA         NA         NA         NA
C         NA         NA         NA         NA         NA
D         NA         NA         NA         NA         NA
enrollments
# A tibble: 4 x 2
  ID    date      
  <chr> <chr>     
1 D     2016-01-01
2 B     2016-01-03
3 C     2016-01-05
4 D     2016-01-02

CodePudding user response:

Here is a base R approach:

replace(tbl, arrayInd(match(do.call(
  paste, subset(enrollments, select = c("date", "ID"))),
t(outer(colnames(tbl), rownames(tbl), FUN = paste)
)), .dim = dim(tbl)), 1)

Output

  2016-01-01 2016-01-02 2016-01-03 2016-01-04 2016-01-05
A         NA         NA         NA         NA         NA
B         NA         NA          1         NA         NA
C         NA         NA         NA         NA          1
D          1          1         NA         NA         NA
  •  Tags:  
  • Related