I have a dataframe like this:
my_df <- data.frame(
ID = c(2, 4, 6, 8, 10, 12, 14, 16, 18),
b2 = c(NA, 4, 6, 2, NA, 6, 1, 1, NA))
and, I want to replace all NA's with '0', and every other values (Non-NA's) with '1', and place them in a new column (b4)
I can replace only NA's with 0 using this:
my_df2 <- my_df %>%
mutate(b3 = replace(b2,is.na(b2),0))
I would have thought I can use below step to then replace other values (Non-NA's) with '1':
my_df3 <- my_df2 %>% mutate(b4=ifelse(b3=="NA","0","1"))
This however, does not work the way I anticipated. Perhaps how to get through this in one go. Any advice with this please?
CodePudding user response:
The problem with the code in the question is that comparing to "NA" is not the same as checking if the value is NA. What that is doing is comparing the value to a character string which contains N and A. Also note that comparing to NA always gives NA so we can't use that either. Instead use is.na.
my_df$b2 == "NA"
## [1] NA FALSE FALSE FALSE NA FALSE FALSE FALSE NA
my_df$b2 == NA
## [1] NA NA NA NA NA NA NA NA NA
is.na(my_df$b2)
## [1] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
Now, since coercing TRUE and FALSE to numeric gives 1 and 0 respectively,
TRUE
## [1] 1
FALSE
## [1] 0
we can compute !is.na(b2) which is TRUE if it is not NA and FALSE if it is and then convert that to numeric using to give the 0/1 value needed.
my_df %>% mutate(b3 = !is.na(b2))
giving:
ID b2 b3
1 2 NA 0
2 4 4 1
3 6 6 1
4 8 2 1
5 10 NA 0
6 12 6 1
7 14 1 1
8 16 1 1
9 18 NA 0
CodePudding user response:
Please find below one possible answer using the dplyr library
Reprex
- Code
library(dplyr)
my_df %>%
mutate(b2 = if_else(is.na(b2), 0, 1))
- Output
#> ID b2
#> 1 2 0
#> 2 4 1
#> 3 6 1
#> 4 8 1
#> 5 10 0
#> 6 12 1
#> 7 14 1
#> 8 16 1
#> 9 18 0
Created on 2022-01-20 by the reprex package (v2.0.1)
CodePudding user response:
You are not using NA properly here -- you are treating it like a character variable in x=="NA" - with NA values, standard practice is to use is.na(), not x==NA. Try:
my_df$b3 <- ifelse(is.na(my_df$b2), 0, 1)
