I would love to replace empty/NA values from a data table with other values that are shown in the variable of the same data table
So I have something like this:
| Country | Nationality | currentLoc |
|---|---|---|
| AR | ES | RU |
| empty | empty | ES |
| empty | AR | AR |
| EN | EN | EN |
| EN | ES | ES |
Result should be:
| Country | Nationality | currentLoc |
|---|---|---|
| AR | ES | RU |
| ES | empty | ES |
| AR | AR | EN |
| EN | EN | EN |
| EN | ES | ES |
I wrote the following R code:
df1$Country[df1$Country== "" | df1$Country==" "] <- "empty"
df1$Country <-ifelse(df1$Country == "empty", df1$Nationality, ifelse(df1$Nationality == "empty", df1$currentLoc, df1$Country))
However not all "empty" cells are replaced in the "Country" Column.
I tried already to specify the empty spaces in the df1$Country with NA, so I have done like
df1$Country[df1$Country== "" | df1$Country==" "] <- NA
df1$Country <-ifelse(is.na(df1$Country), df1$Nationality, ifelse(is.na(df1$Nationality), df1$currentLoc, df1$Country))
That is unfortunately also not working.
Could you help me out in this? I searched for similar questions here, but did not quit get the desired response.
Any help very much appriciated. :)
Cheers
Le1nO
CodePudding user response:
If I understood correctly, you want to relace "empty" values in the Country variable with the value of the Nationality column, and in the case the Nationality variable is also "empty", then replace Country with the currentLoc value.
If this is correct, your first attempt is close to what you want, but you need to fix it a little. ifelse take a test, a value if the test is true and a value if the test is false, so:
test:
df1$Country == "empty"meaning testing ifCountryis emptyvalue for true: If
Countryis empty, means that we need to takeNationality, but this variable can also be empty, so we need to check for it with anotherifelse, something like this:ifelse(df1$Nationality == "empty", df1$currentLoc, df1$Nationality)which means, check if nationality is empty and if it is, takecurrentLoc, if not left theNationalityvalue.value for false: if
Countryis not empty, then nothing to do, left the country value.
So something like this should work:
df1 <- data.frame(
Country = c("AR", "empty", "empty", "EN", "EN"),
Nationality = c("ES", "empty", "AR", "EN", "ES"),
currentLoc = c("RU", "ES", "AR", "EN", "ES")
)
df1$Country <- ifelse(
test = df1$Country == "empty",
yes = ifelse(df1$Nationality == "empty", df1$currentLoc, df1$Nationality),
no = df1$Country
)
df1
#> Country Nationality currentLoc
#> 1 AR ES RU
#> 2 ES empty ES
#> 3 AR AR AR
#> 4 EN EN EN
#> 5 EN ES ES
Created on 2022-02-23 by the reprex package (v2.0.1)
CodePudding user response:
An approach using the data.table package:
library(data.table)
df1 <- data.table(
Country = c("AR", "" , "" , "EN", "EN"),
Nationality = c("ES", "" , "AR", "EN", "ES"),
currentLoc = c("RU", "ES", "AR", "EN", "ES")
)
First step: convert all blanks to NA:
df1[df1==""] <- NA
Second step: Country equals to Country, otherwise Nationality, otherwise currentLoc:
> df1 [, Country:=fcoalesce(Country,Nationality, currentLoc)][]
Country Nationality currentLoc
<char> <char> <char>
1: AR ES RU
2: ES <NA> ES
3: AR AR AR
4: EN EN EN
5: EN ES ES
OBS: You may apply a regex for testing for "empty" records (for instance, "", "empty", " ", etc.) by substituting the instruction df1[df1==""] <- NA for:
regex_empty <- "^ *$|empty" # regex that captures multiple spaces and "empty" records
regex_mapping <- df2[, lapply(.SD,function(x)grepl(regex_empty, x, perl = T))] |> as.matrix()
df2[regex_mapping] <- NA
This code will make it possible to deal with cases such as df2:
> df2 <- data.table(
Country = c("AR", "" , "empty", "EN", "EN"),
Nationality = c("ES", " ", "AR" , "EN", "ES"),
currentLoc = c("RU", "ES" , "AR" , "EN", "ES")
)
>
> regex_empty <- "^ *$|empty"
> regex_mapping <- df2[, lapply(.SD,function(x)grepl(regex_empty, x, perl = T))] |> as.matrix()
> df2[regex_mapping] <- NA
> df2 [, Country:=fcoalesce(Country,Nationality, currentLoc)][]
Country Nationality currentLoc
<char> <char> <char>
1: AR ES RU
2: ES <NA> ES
3: AR AR AR
4: EN EN EN
5: EN ES ES
>
> all.equal(df1, df2)
[1] TRUE
