Home > OS >  Create a new column based on a dictionary using R
Create a new column based on a dictionary using R

Time:01-05

For the following dataframe d, I'm trying to create a new column by replacing col1 using a dictionary dict_to_replace:

library(tidyverse)
library(stringr)

d <- data.frame(col1 = c("AA", "AG", "AC", "AA"), col2 = c(NA, "GG", "GG", "GC"), stringsAsFactors=FALSE)
dict_to_replace <- c('AA'='a', 'AG'='b')

d %>% 
  mutate(
    col3 = str_replace_all(col1, pattern = dict_to_replace)
  )

Out:

  col1 col2 col3
1   AA <NA>    a
2   AG   GG    b
3   AC   GG   AC
4   AA   GC    a

But I expected if the values in col1 is not in the key of dict_to_replace, then replace by NA instead of values in col1, which means the expected result will like this:

  col1 col2 col3
1   AA <NA>    a
2   AG   GG    b
3   AC   GG  <NA>
4   AA   GC    a

How could I achieve that in pipe (%>%) using R? Thanks.

CodePudding user response:

I don't think you can use str_replace_all for this task. Another alternative to use is recode from the dplyr package.

d %>% 
  mutate(
    col3 = recode(col1, !!!dict_to_replace, .default = NA_character_)
  )

In here you use the bang bang operator !!! to unquote the dict_to_replace named vector, and the .default argument allows you to change values not matching in the col1 column. More can be found in the documentation as stated:

.default If supplied, all values not otherwise matched will be given this value. If not supplied and if the replacements are the same type as the original values in .x, unmatched values are not changed. If not supplied and if the replacements are not compatible, unmatched values are replaced with NA.

  •  Tags:  
  • Related