I have a CSV file which consisted of 10000 rows of data. I need to perform loop through a specific column and replace the value if it meets the criteria with the particular value. If it meet the criteria which is NA and will be selected the random selection of the value from the array which
Name Age Sex
Alice 20 M
James NA F
Jerome 30 M
Alex 25 M
Bruce NA M
Tan 45 M
Olive 25 F
Jasmine 37 F
My code is found below.
result <- array(c(50,24,12,30,60,16,71,81))
for (i in 2:ncol(mydata))
{
if (mydata[ ,i] is.NA())
{
mydata[ ,i] == sample(result)
}
else
{
next
}
}
CodePudding user response:
Let's do a tidyverse approach. It replaces any NA in the Age column with a random selection of values in result.
library(tidyverse)
result <- array(c(50,24,12,30,60,16,71,81))
set.seed(1)
df %>% mutate(Age = replace_na(Age, sample(result, 1)))
Output
Name Age Sex
1 Alice 20 M
2 James 12 F
3 Jerome 30 M
4 Alex 25 M
5 Bruce 12 M
6 Tan 45 M
7 Olive 25 F
8 Jasmine 37 F
CodePudding user response:
You may do this with the help of this helper function which can be applied with the help of lapply to multiple columns.
set.seed(2022)
replace_NA <- function(x, arr) {
x[is.na(x)] <- sample(arr, sum(is.na(x)), replace = TRUE)
x
}
mydata[-1] <- lapply(mydata[-1], replace_NA, result)
mydata
# Name Age Sex
#1 Alice 20 M
#2 James 4 F
#3 Jerome 30 M
#4 Alex 25 M
#5 Bruce 3 M
#6 Tan 45 M
#7 Olive 25 F
#8 Jasmine 37 F
