I have a CSV file which consisted of 10000 rows of data. I need to perform loop through a specific column and replace the value if it meets the criteria with the particular value. If it meet the criteria which is NA and will be selected the random selection of the value from the array which

Name     Age    Sex
Alice    20     M
James    NA     F
Jerome   30     M
Alex     25     M
Bruce    NA     M
Tan      45     M
Olive    25     F
Jasmine  37     F

My code is found below.

result <- array(c(50,24,12,30,60,16,71,81))
for (i in 2:ncol(mydata))
{
    if (mydata[ ,i] is.NA())
    {
        mydata[ ,i] == sample(result)

    }
    else 
    {
        next
    }
}

CodePudding user response：

Let's do a tidyverse approach. It replaces any NA in the Age column with a random selection of values in result.

library(tidyverse)

result <- array(c(50,24,12,30,60,16,71,81))

set.seed(1)
df %>% mutate(Age = replace_na(Age, sample(result, 1)))

Output

    Name Age Sex
1   Alice  20   M
2   James  12   F
3  Jerome  30   M
4    Alex  25   M
5   Bruce  12   M
6     Tan  45   M
7   Olive  25   F
8 Jasmine  37   F

CodePudding user response：

You may do this with the help of this helper function which can be applied with the help of lapply to multiple columns.

set.seed(2022)

replace_NA <- function(x, arr) {
  x[is.na(x)] <- sample(arr, sum(is.na(x)), replace = TRUE)
  x
}

mydata[-1] <- lapply(mydata[-1], replace_NA, result)
mydata

#     Name Age Sex
#1   Alice  20   M
#2   James   4   F
#3  Jerome  30   M
#4    Alex  25   M
#5   Bruce   3   M
#6     Tan  45   M
#7   Olive  25   F
#8 Jasmine  37   F