Ive been struggling to find a reliable and concise way to recode a variable that is a 4 digit numerical code that signifies some combination other variables, we can say binary for now. Those variables are:
location: 1 = North, 2= South
sex: 1= male, 2= female
job: 1= driver, 2= construction
income: 1= high, 2= low
For example, a variable of 1111 code means: North,male,driver,high
R code for data is below:
library(tidyselect)
library(tidyverse)
library(dplyr)
location <- c("North", "South")
sex <- c("male", "female")
job <- c("driver", "construction")
income <- c("high, "low")
dt <- tibble(data= c(1112,1212,1122,1221))
# A tibble: 4 × 1
data
<dbl>
1 1112
2 1212
3 1133
4 1231
I want to recode this column to get the final output
# A tibble: 4 × 1
data
<chr>
1 North,male,driver,high
2 North,female,driver,low
3 North,male,construction,low
4 North,female,construction,high
Ive tried various combinations of str_extract looking to use regex for digit position, followed by either ifelse or case_when attempts and it either doesnt work or is massively bulky and redundant for the real data set (there 4 digit code with up to 9 actual other characters per digit location)
CodePudding user response:
We could create a list of named vectors and then do a match
library(dplyr)
library(tidyr)
lst1 <- list(location = c(`1` = 'North', `2` = 'South'),
sex = c(`1` = 'male', `2` = 'female'), job = c(`1` = 'driver', `2` = 'construction'), income = c(`1` = 'high', `2` = 'low'))
dt %>%
separate(data, into = c('location', 'sex', 'job', 'income'),
sep = "(?<=\\d)(?=\\d)") %>%
mutate(across(everything(), ~ lst1[[cur_column()]][.x])) %>%
unite(data, everything(), sep = ",")
-output
# A tibble: 4 × 1
data
<chr>
1 North,male,driver,low
2 North,female,driver,low
3 North,male,construction,low
4 North,female,construction,high
