I have a dataframe that shows ICD-10 codes for people who have died (decedents). Each row in the data frame corresponds to a decedent, each of whom can have up to twenty conditions listed as contributing factors to his or her death. I want to create a new column that shows if a decedent had any ICD-10 code for diabetes (1 for yes, 0 for no). The codes for diabetes fall within E10-E14 i.e., codes for diabetes must start with any of the strings in the following vector, but the fourth position can take on different values:
diabetes <- c("E10","E11","E12","E13","E14")
This is a small, made-up example of what the data looks like:
original <- structure(list(acond1 = c("E112", "I250", "A419", "E149"), acond2 = c("I255",
"B341", "F179", "F101"), acond3 = c("I258", "B348", "I10", "I10"
), acond4 = c("I500", "E669", "I694", "R092")), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
| acond1 | acond2 | acond3 | acond4 |
|---|---|---|---|
| E112 | I255 | I258 | I500 |
| I250 | B341 | B348 | E669 |
| A419 | F179 | I10 | I694 |
| E149 | F101 | I10 | R092 |
This is my desired result:
| acond1 | acond2 | acond3 | acond4 | diabetes |
|---|---|---|---|---|
| E112 | I255 | I258 | I500 | 1 |
| I250 | B341 | B348 | E669 | 0 |
| A419 | F179 | I10 | I694 | 0 |
| E149 | F101 | I10 | R092 | 1 |
There have been a couple other posts (e.g., Using if else on a dataframe across multiple columns, Str_detect multiple columns using across) on this type of question, but I can't seem to put it all together. Here is what I have unsuccessfully tried so far:
library(tidyverse)
library(stringr)
#attempt 1
original %>%
mutate_at(vars(contains("acond")), ifelse(str_detect(.,paste0("^(",
paste(diabetes, collapse = "|"), ")")), 1, 0))
#attempt 2
original %>%
unite(col = "all_conditions", starts_with("acond"), sep = ", ", remove = FALSE) %>%
mutate(diabetes = if_else(str_detect(.,paste0("^(", paste(diabetes, collapse = "|"), ")")), 1, 0))
Any help would be appreciated.
CodePudding user response:
library(tidyverse)
diabetes_pattern <- c("E10","E11","E12","E13","E14") %>%
str_c(collapse = "|")
original <-
structure(
list(
acond1 = c("E112", "I250", "A419", "E149"),
acond2 = c("I255", "B341", "F179", "F101"),
acond3 = c("I258", "B348", "I10", "I10"),
acond4 = c("I500", "E669", "I694", "R092")
),
row.names = c(NA,-4L),
class = c("tbl_df", "tbl", "data.frame")
)
original %>%
rowwise() %>%
mutate(diabetes = any(str_detect(string = c_across(everything()), pattern = diabetes_pattern)))
#> # A tibble: 4 x 5
#> # Rowwise:
#> acond1 acond2 acond3 acond4 diabetes
#> <chr> <chr> <chr> <chr> <int>
#> 1 E112 I255 I258 I500 1
#> 2 I250 B341 B348 E669 0
#> 3 A419 F179 I10 I694 0
#> 4 E149 F101 I10 R092 1
original %>%
mutate(diabetes = rowSums(across(.cols = everything(), ~str_detect(.x, diabetes_pattern))))
#> # A tibble: 4 x 5
#> acond1 acond2 acond3 acond4 diabetes
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 E112 I255 I258 I500 1
#> 2 I250 B341 B348 E669 0
#> 3 A419 F179 I10 I694 0
#> 4 E149 F101 I10 R092 1
Created on 2022-01-23 by the reprex package (v2.0.1)
CodePudding user response:
Here's a base R approach using apply
dia <- paste(c("E10","E11","E12","E13","E14"), collapse="|")
df$diabetes <- apply(df, 1, function(x) any(grepl(dia,x)))*1
df
acond1 acond2 acond3 acond4 diabetes
1 E112 I255 I258 I500 1
2 I250 B341 B348 E669 0
3 A419 F179 I10 I694 0
4 E149 F101 I10 R092 1
With dplyr
library(dplyr)
df %>%
rowwise() %>%
mutate(diabetes=any(grepl(dia,c_across(starts_with("ac"))))*1) %>%
ungroup
# A tibble: 4 × 5
acond1 acond2 acond3 acond4 diabetes
<chr> <chr> <chr> <chr> <dbl>
1 E112 I255 I258 I500 1
2 I250 B341 B348 E669 0
3 A419 F179 I10 I694 0
4 E149 F101 I10 R092 1
Data
df <- structure(list(acond1 = c("E112", "I250", "A419", "E149"), acond2 = c("I255",
"B341", "F179", "F101"), acond3 = c("I258", "B348", "I10", "I10"
), acond4 = c("I500", "E669", "I694", "R092")), class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
If we want to use across wit ifelse and str_detect then we could:
- create a pattern with
pasteandcollapseforstr_detect mutateacrossall columns and use anonymous~ifelsewith the condition and.namesto control for the new columnsunitethe new columns- trick with
parse_numberfromreadrpackage
diabetes <- c("E10","E11","E12","E13","E14")
pattern <- paste(diabetes, collapse = "|")
library(tidyverse)
original %>%
mutate(across(everything(), ~ifelse(str_detect(., pattern), 1, 0), .names = "new_{col}")) %>%
unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>%
mutate(diabetes = parse_number(New_Col), .keep="unused")
acond1 acond2 acond3 acond4 diabetes
<chr> <chr> <chr> <chr> <dbl>
1 E112 I255 I258 I500 1
2 I250 B341 B348 E669 0
3 A419 F179 I10 I694 0
4 E149 F101 I10 R092 1
