I have two data frames. df1 has some NAs in a column that I'm interested in, and df2 has this same column but with the values I need. I want to fill those values in df1 according to the rownames (the rownames in df2 are found in df1).
For example, df1 would be something like this:
Age Gender
Patient_1 NA Male # Has NA
Patient_3 30 Male
Patient_66 45 Female
Patient_10 NA Female # Has NA
Patient_11 67 Female
Patient_23 NA Male # Has NA
and df2 contains the same information but only for the samples that have NA in the age column in df1:
Age Gender
Patient_1 59 Male
Patient_10 80 Female
Patient_23 21 Male
Now how do I fill those Age values, from df2 to df1, for each sample?
I tried this:
for(i in 1:length(df1$Age)){
if(rownames(df1)[i] == rownames(df2)[i]){
df1$Age[[i]] = df2$Age[[i]]
}
}
That didn't work as it filled only a subset of the values for some reason, some values are still NA in df1.
CodePudding user response:
Since df2 contains data for all NA rows in df1, you can simply index all NA rows in df1 using the rownames of df2 and then overwrite their Age values with the df2$Age column.
df1[rownames(df2), "Age"] <- df2$Age
CodePudding user response:
The dplyr and tibble packages have nice functions for this. First convert to a tibble with tibble::rownames_to_column(). Then use dplyr::rows_patch() to replace the NA values with the corresponding values by patient_id.
library(dplyr)
library(tibble)
df1 |>
rownames_to_column("patient_id") |>
rows_patch(
rownames_to_column(df2, "patient_id")
)
# Matching, by = "patient_id"
# patient_id Age Gender
# 1 Patient_1 59 Male
# 2 Patient_3 30 Male
# 3 Patient_66 45 Female
# 4 Patient_10 80 Female
# 5 Patient_11 67 Female
# 6 Patient_23 21 Male
