Home > database >  Extract variable from numerical code based on digit location with R/Regex
Extract variable from numerical code based on digit location with R/Regex

Time:02-05

Ive been struggling to find a reliable and concise way to recode a variable that is a 4 digit numerical code that signifies some combination other variables, we can say binary for now. Those variables are:


location: 1 = North, 2= South
sex: 1= male, 2= female
job: 1= driver, 2= construction
income: 1= high, 2= low

For example, a variable of 1111 code means: North,male,driver,high

R code for data is below:

    library(tidyselect)
    library(tidyverse)
    library(dplyr)
    
    location <- c("North", "South")
    sex <- c("male", "female")
    job <- c("driver", "construction")
    income <- c("high, "low") 
    
    dt <- tibble(data= c(1112,1212,1122,1221))

# A tibble: 4 × 1
   data
  <dbl>
1  1112
2  1212
3  1133
4  1231

I want to recode this column to get the final output

# A tibble: 4 × 1
  data                          
  <chr>                         
1 North,male,driver,high        
2 North,female,driver,low       
3 North,male,construction,low   
4 North,female,construction,high

Ive tried various combinations of str_extract looking to use regex for digit position, followed by either ifelse or case_when attempts and it either doesnt work or is massively bulky and redundant for the real data set (there 4 digit code with up to 9 actual other characters per digit location)

CodePudding user response:

We could create a list of named vectors and then do a match

library(dplyr)
library(tidyr)
lst1 <- list(location = c(`1` = 'North', `2` = 'South'),
   sex = c(`1` = 'male', `2` = 'female'), job = c(`1` = 'driver', `2` = 'construction'), income = c(`1` = 'high', `2` = 'low'))
 dt %>% 
  separate(data, into = c('location', 'sex', 'job', 'income'),
       sep = "(?<=\\d)(?=\\d)") %>%
   mutate(across(everything(), ~ lst1[[cur_column()]][.x])) %>% 
   unite(data, everything(), sep = ",")

-output

# A tibble: 4 × 1
  data                          
  <chr>                         
1 North,male,driver,low         
2 North,female,driver,low       
3 North,male,construction,low   
4 North,female,construction,high
  •  Tags:  
  • Related