Can you automatically delete rows of data in a data frame based off of multiple requirements in R?-CodePudding

Is it possible to look at variables in a data frame and delete some rows based off of certain conditions? If I have the table:

Number	Value
1	TRUE
1	FALSE
2	FALSE
2	FALSE
3	FALSE
3	TRUE
4	FALSE
4	FALSE
5	TRUE
5	FALSE

I want to have exactly one row of each number, and I will delete whichever row is false, and if both values in the number are false, then I will just delete one of the rows. This should leave me with the table like

Number	Value
1	TRUE
2	FALSE
3	TRUE
4	FALSE
5	TRUE

Is it possible to filter by number then delete the first false value? Or anything similar to that?

CodePudding user response：

You can arrange and then use distinct -

library(dplyr)

df %>%
  arrange(Number, !Value) %>%
  distinct(Number, .keep_all = TRUE)

#  Number Value
#1      1  TRUE
#2      2 FALSE
#3      3  TRUE
#4      4 FALSE
#5      5  TRUE

arrange would keep the TRUE values ahead of FALSE ones and then we select the 1st row for each Number.

Another option would be to check for condition in each group.

df %>%
  group_by(Number) %>%
  filter(if(any(Value)) Value else row_number() == 1) %>%
  ungroup

CodePudding user response：

Another approach:

library(dplyr)
df %>% group_by(Number) %>% filter(if(sum(Value == FALSE) == 2) row_number() == 1 else Value == TRUE)
# A tibble: 5 x 2
# Groups:   Number [5]
  Number Value
   <int> <lgl>
1      1 TRUE 
2      2 FALSE
3      3 TRUE 
4      4 FALSE
5      5 TRUE