Home > Software engineering >  assigning the missing values in a more high-efficiency way
assigning the missing values in a more high-efficiency way

Time:11-15

I'm a new user of Rstudio. I hope to assign all the 98,99 as missing value of some of the variables in the data. So I wrote the following codes. I'm wondering is there any high-efficiency way to assign the missing values?

CGSS <- read_dta("D:/R recording/cgss2017.dta")
head(CGSS$d13)
cgss <- dplyr::select(CGSS,"a2","a421","a422","a423","a424","a425","c52",
                      "a44","a45","a46","a47","a48","c6","a10","d131","d132",
                      "d133","d134","d135")
cgss$a421[cgss$a421==98|cgss$a421==99] <- NA
cgss$a422[cgss$a422==98|cgss$a422==99] <- NA
cgss$a423[cgss$a423==98|cgss$a423==99] <- NA
cgss$a424[cgss$a424==98|cgss$a424==99] <- NA
cgss$a425[cgss$a425==98|cgss$a425==99] <- NA
cgss$a44[cgss$a44==98|cgss$a44==99] <- NA
cgss$a45[cgss$a45==98|cgss$a45==99] <- NA
cgss$a46[cgss$a46==98|cgss$a46==99] <- NA
cgss$a47[cgss$a47==98|cgss$a47==99] <- NA
cgss$a48[cgss$a48==98|cgss$a48==99] <- NA

CodePudding user response:

In general you can identify missing values and convert them to NA when reading them in, but this may not be possible with this function. To set them to NA after reading in the data, try this more efficient indexing approach:

> set.seed(112)
> x <- data.frame(a = round(runif(10, 95, 100)), 
                  b = round(runif(10, 95, 100)))
> x
     a   b
1   97  96
2  100  97
3  100 100
4  100  98
5   96  98
6   96 100
7   95  98
8   99  98
9   96  99
10  97  97

> x[x == 98 | x == 99] <- NA

> x
     a   b
1   97  96
2  100  97
3  100 100
4  100  NA
5   96  NA
6   96 100
7   95  NA
8   NA  NA
9   96  NA
10  97  97

Does this do what you want?

With read.table() and similar functions, you could use na.strings = 98:99. readr functions have an na argument. I am not familiar with read_dta(), but it seems it doesn't have a similar argument.

CodePudding user response:

library(dplyr)
cgss %>%
  mutate(across(everything(), ~if_else(.x %in% c(98, 99), NA_real_, .x)))
  •  Tags:  
  • r
  • Related