I have a df1 that looks like
df1:
df1 <- data.frame(
DE201 = c("A15", "A18"),
DE207 = c("A903", "A906")
)
| DE201 | DE207 |
|---|---|
| A15 | A903 |
| A18 | A906 |
...
and another df2 that has the dictionary values for those values, row wise
df2:
df2 <- data.frame(
module = rep("Fall", 4),
Data_Element_ID = c(rep("DE201", 2), rep("DE207", 2)),
Data_Element_Name = c(rep("Injury result", 2), rep("Patient activity", 2)),
Answer_Code = c("A15", "A18", "A903", "A906"),
Answer_value = c("Yes", "No", "Ambulating with assistance", "Intracranial injury"),
DE_original = c("DE201_A15", "DE201_A18", "DE207_A903", "DE207_A906")
)
| module | Data_Element_ID | Data_Element_Name | Answer_Code | Answer_Value | DE_original |
|---|---|---|---|---|---|
| Fall | DE201 | Injury result | A15 | Yes | DE201_A15 |
| Fall | DE201 | Injury result | A18 | No | DE201_A18 |
| Fall | DE207 | Patient activity | A903 | Ambulating with assistance | DE207_A903 |
| Fall | DE207 | Patient activity | A906 | Intracranial injury | DE207_A906 |
...
I want to replace all the values in df1, including its column name, to be the matching values in df2 (df1 column names to be matched df2's Data_Element_Name, and df1 values to be matched df2's Answer_Value)
for example, my desired output is something like
output <- data.frame(
Injury_result = c("Yes", "No"),
Patient_activity = c("Ambulating with assistance", "Intracranial injury")
)
| Injury result | Patient activity |
|---|---|
| Yes | Ambulating with assistance |
| No | Intracranial injury |
...
I've tried below two methods but none of them worked, and merge/join are not really applicable here because they don't really have the same key/ID to join/merge by, since one is column wise elements , and one is row wise elements.
Tried mutate() with if_else() and case_when():
mutate(
DE201 = case_when(
DE201 == df2$Answer_Code ~ df2$Answer_Value,
TRUE ~ DE201
),
DE204 = if_else(
DE204 %in% df2$Answer_Code,
df2$Answer_Value, DE204
)
)
CodePudding user response:
There might be a more direct way of doing this, but it's not a straightforward task. In this case, I reshaped the data from df1 to a long format using pivot_longer() in order to tie in the data from df2 using left_join(). Then, it's a question of reshaping the data back to a wide format using pivot_wider() for those columns of data you wanted to keep.
Note that I'm adding a row id that I eventually discard because we need to keep track of what lines of data goes where when we are reshaping from wide to long back to wide.
library(dplyr)
library(tidyr)
df1 |>
mutate(row_id = row_number()) |>
pivot_longer(-row_id, names_to = "Data_Element_ID", values_to = "Answer_Code") |>
left_join(df2) |>
select(row_id, Data_Element_Name, Answer_value) |>
pivot_wider(names_from = Data_Element_Name, values_from = Answer_value) |>
select(-row_id)
# A tibble: 2 x 2
`Injury result` `Patient activity`
<chr> <chr>
1 Yes Ambulating with assistance
2 No Intracranial injury
CodePudding user response:
Here is a solution with data.table:
library(data.table)
dict <- rbindlist(list(df2[,c("Answer_Code", "Answer_Value")],
df2[,c("Data_Element_ID", "Data_Element_Name")]))
colnames(dict) <- c('key', 'val')
match_function <- function(column){
dict[match(column, dict$key), 2, drop=F]
}
df3 <- as.data.frame(apply(df1, MARGIN = 2, match_function))
names(df3) <- dict[match(names(df1), dict$key), 2, drop=F][[1]]
Here is a step by step explanation of why it works. We create a dictionary using data.table and extract the key/val pairs. For example, the "Answer_Code" column is mapped to the "Answer_Value" column. Same for "Data_Element_ID" and "Data_Element_Name". I change the names to key and val for clarity.
library(data.table)
dict <- rbindlist(list(df2[,c("Answer_Code", "Answer_Value")],
df2[,c("Data_Element_ID", "Data_Element_Name")]))
colnames(dict) <- c('key', 'val')
The dictionary "dict" looks like this:
key val
1: A15 Yes
2: A18 No
3: A903 Ambulating with assistance
4: A906 Intracranial injury
5: DE201 Injury result
6: DE201 Injury result
7: DE207 Patient activity
8: DE207 Patient activity
Now we can create a function which will take in a column, and replace the code/key with the name/val. We use apply with the margin of 2 (to iterate over columns) for this.
match_function <- function(column){
dict[match(column, dict$key), MARGIN = 2, drop=F]
}
df3 <- as.data.frame(apply(df1, 2, match_function))
df3 at this point looks like this:
val val.1
1 Yes Ambulating with assistance
2 No Intracranial injury
Now we just need to change the column names using the same dictionary process.
names(df3) <- dict[match(names(df1), dict$key), 2, drop=F][[1]]
df3
Our results match what you expected.
Injury result Patient activity
1 Yes Ambulating with assistance
2 No Intracranial injury
