Is it possible to look at variables in a data frame and delete some rows based off of certain conditions? If I have the table:
| Number | Value |
|---|---|
| 1 | TRUE |
| 1 | FALSE |
| 2 | FALSE |
| 2 | FALSE |
| 3 | FALSE |
| 3 | TRUE |
| 4 | FALSE |
| 4 | FALSE |
| 5 | TRUE |
| 5 | FALSE |
I want to have exactly one row of each number, and I will delete whichever row is false, and if both values in the number are false, then I will just delete one of the rows. This should leave me with the table like
| Number | Value |
|---|---|
| 1 | TRUE |
| 2 | FALSE |
| 3 | TRUE |
| 4 | FALSE |
| 5 | TRUE |
Is it possible to filter by number then delete the first false value? Or anything similar to that?
CodePudding user response:
You can arrange and then use distinct -
library(dplyr)
df %>%
arrange(Number, !Value) %>%
distinct(Number, .keep_all = TRUE)
# Number Value
#1 1 TRUE
#2 2 FALSE
#3 3 TRUE
#4 4 FALSE
#5 5 TRUE
arrange would keep the TRUE values ahead of FALSE ones and then we select the 1st row for each Number.
Another option would be to check for condition in each group.
df %>%
group_by(Number) %>%
filter(if(any(Value)) Value else row_number() == 1) %>%
ungroup
CodePudding user response:
Another approach:
library(dplyr)
df %>% group_by(Number) %>% filter(if(sum(Value == FALSE) == 2) row_number() == 1 else Value == TRUE)
# A tibble: 5 x 2
# Groups: Number [5]
Number Value
<int> <lgl>
1 1 TRUE
2 2 FALSE
3 3 TRUE
4 4 FALSE
5 5 TRUE
