Replace empty / NA Values from data table with other values from the same data table in R-CodePudding

I would love to replace empty/NA values from a data table with other values that are shown in the variable of the same data table

So I have something like this:

Country	Nationality	currentLoc
AR	ES	RU
empty	empty	ES
empty	AR	AR
EN	EN	EN
EN	ES	ES

Result should be:

Country	Nationality	currentLoc
AR	ES	RU
ES	empty	ES
AR	AR	EN
EN	EN	EN
EN	ES	ES

I wrote the following R code:

df1$Country[df1$Country== "" | df1$Country=="  "] <- "empty"

df1$Country <-ifelse(df1$Country == "empty", df1$Nationality, ifelse(df1$Nationality == "empty", df1$currentLoc, df1$Country))

However not all "empty" cells are replaced in the "Country" Column.

I tried already to specify the empty spaces in the df1$Country with NA, so I have done like

df1$Country[df1$Country== "" | df1$Country=="  "] <- NA

df1$Country <-ifelse(is.na(df1$Country), df1$Nationality, ifelse(is.na(df1$Nationality), df1$currentLoc, df1$Country))

That is unfortunately also not working.

Could you help me out in this? I searched for similar questions here, but did not quit get the desired response.

Any help very much appriciated. :)

Cheers

Le1nO

CodePudding user response：

If I understood correctly, you want to relace "empty" values in the Country variable with the value of the Nationality column, and in the case the Nationality variable is also "empty", then replace Country with the currentLoc value. If this is correct, your first attempt is close to what you want, but you need to fix it a little. ifelse take a test, a value if the test is true and a value if the test is false, so:

test: df1$Country == "empty" meaning testing if Country is empty
value for true: If Country is empty, means that we need to take Nationality, but this variable can also be empty, so we need to check for it with another ifelse, something like this: ifelse(df1$Nationality == "empty", df1$currentLoc, df1$Nationality) which means, check if nationality is empty and if it is, take currentLoc, if not left the Nationality value.
value for false: if Country is not empty, then nothing to do, left the country value.

So something like this should work:

df1 <- data.frame(
  Country = c("AR", "empty", "empty", "EN", "EN"),
  Nationality = c("ES", "empty", "AR", "EN", "ES"),
  currentLoc = c("RU", "ES", "AR", "EN", "ES")
)


df1$Country <- ifelse(
  test = df1$Country == "empty",
  yes = ifelse(df1$Nationality == "empty", df1$currentLoc, df1$Nationality),
  no = df1$Country
)

df1
#>   Country Nationality currentLoc
#> 1      AR          ES         RU
#> 2      ES       empty         ES
#> 3      AR          AR         AR
#> 4      EN          EN         EN
#> 5      EN          ES         ES

^{Created on 2022-02-23 by the reprex package (v2.0.1)}

CodePudding user response：

An approach using the data.table package:

library(data.table)    
df1 <- data.table(
      Country     = c("AR", ""  , ""  , "EN", "EN"),
      Nationality = c("ES", ""  , "AR", "EN", "ES"),
      currentLoc  = c("RU", "ES", "AR", "EN", "ES")
    )

First step: convert all blanks to NA:

df1[df1==""] <- NA

Second step: Country equals to Country, otherwise Nationality, otherwise currentLoc:

> df1 [, Country:=fcoalesce(Country,Nationality, currentLoc)][]
   Country Nationality currentLoc
    <char>      <char>     <char>
1:      AR          ES         RU
2:      ES        <NA>         ES
3:      AR          AR         AR
4:      EN          EN         EN
5:      EN          ES         ES

OBS: You may apply a regex for testing for "empty" records (for instance, "", "empty", " ", etc.) by substituting the instruction df1[df1==""] <- NA for:

regex_empty <- "^ *$|empty"  # regex that captures multiple spaces and "empty" records
regex_mapping <- df2[, lapply(.SD,function(x)grepl(regex_empty, x, perl = T))] |> as.matrix()
df2[regex_mapping] <- NA

This code will make it possible to deal with cases such as df2:

> df2 <- data.table(
    Country     = c("AR", ""     , "empty", "EN", "EN"),
    Nationality = c("ES", "     ", "AR"   , "EN", "ES"),
    currentLoc  = c("RU", "ES"   , "AR"   , "EN", "ES")
  ) 
> 
> regex_empty <- "^ *$|empty"
> regex_mapping <- df2[, lapply(.SD,function(x)grepl(regex_empty, x, perl = T))] |> as.matrix()
> df2[regex_mapping] <- NA
> df2 [, Country:=fcoalesce(Country,Nationality, currentLoc)][]
   Country Nationality currentLoc
    <char>      <char>     <char>
1:      AR          ES         RU
2:      ES        <NA>         ES
3:      AR          AR         AR
4:      EN          EN         EN
5:      EN          ES         ES
> 
> all.equal(df1, df2)
[1] TRUE