Home > Software engineering >  Is there a way to use a cleaner to omit specific cases? (R)
Is there a way to use a cleaner to omit specific cases? (R)

Time:02-08

I have a dataframe where I want to omit cases where ages 30 or less are omitted. I know you can use na.omit to omit NA cases, but how would I omit specific cases like this?

CodePudding user response:

Seems to be more a filtering problem than omitting missing values:

> df <- tibble(age = c(20,25,30,35,40))
> 
> df %>% filter(age < 30)
# A tibble: 2 × 1
    age
  <dbl>
1    20
2    25
> 

CodePudding user response:

With base R, you can filter out all rows where the age is greater than 50.

df[df$age < 30,]

    age values
  <int>  <dbl>
1    21  1.89 
2    22  1.01 
3    23  0.107
4    24  1.46 
5    25  1.17 
6    26  1.86 
7    27  1.77 
8    28  1.91 
9    29  0.594

Or with data.table:

library(data.table)

dt <- data.table(df)
dt[age < 30]

However, if you are wanting to only filter NAs for the rows, where the age is greater than 30, then you can find the row index for age being greater than 30 and another column having NA. Then, you can exclude those rows.

df[!(df$age > 30 & is.na(df$values)),]

Or with subset:

subset(df, !(age > 30 & is.na(values)))

With tidyverse:

library(tidyverse)

df %>% 
  filter(!(age > 30 & is.na(values)))

data.table:

dt <- data.table(df)
dt[!(age > 30 & is.na(values))]

Data

df <- structure(list(age = 21:40, 
                     values = c(1.88648780807853, 1.01084147393703, 
                                0.107075828593224, 1.46145519195125, 1.16910230834037, 1.85718628577888, 
                                1.7749991081655, 1.91132036875933, 0.594451983459294, 0.976039483677596, 
                                1.31880497187376, 1.82749796425924, 1.98314357083291, 0.57053042575717, 
                                0.722490054555237, 1.66634088428691, 0.702816031407565, 0.622223159298301, 
                                0.298387756571174, 1.6071562608704)), 
                class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))
  •  Tags:  
  • Related