How to remove rows conditionally-CodePudding

I have the following data. I want to do a conditional filter where if C (confirmed) value for AMZN is present, then delete the row with E, and if no C, then keep E. In this case, we would have one C row for AMZN and one E row for AAPL.

Any ideas on how best to achieve this in R?

| AMZN| C| 1|
| AMZN| E| 2|
| AAPL| E| 2|

CodePudding user response：

You may try this with dplyr -

library(dplyr)

df %>%
  group_by(V1) %>%
  filter(if(any(V2 == "C")) V2 != "E" else V2 == "E") %>%
  ungroup

#   V1    V2       V3
#  <chr> <chr> <int>
#1 AMZN  C         1
#2 AAPL  E         2

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(V1 = c("AMZN", "AMZN", "AAPL"), V2 = c("C", "E", 
"E"), V3 = c(1L, 2L, 2L)), class = "data.frame", row.names = c(NA, -3L))

CodePudding user response：

library(dplyr)

filter(df, V1 == "AMZN" & V2 == "C" | V2 == "E" & V1 != "AMZN")

    V1 V2 V3
1 AMZN  C  1
2 AAPL  E  2

CodePudding user response：

Here is a possible base R solution:

df[as.logical(with(df, ave(V2, V1, FUN = function(i)
  if(any(i == "C")) i != "E" else i == "E"))), ]

Output

    V1 V2 V3
1 AMZN  C  1
3 AAPL  E  2

Or using data.table:

library(data.table)

setDT(dt)[, .SD[if(any(V2 == "C")) V2 != "E" else V2 == "E"],  .(V1)]

Data

df <-
  structure(list(
    V1 = c("AMZN", "AMZN", "AAPL"),
    V2 = c("C", "E", "E"),
    V3 = c(1L, 2L, 2L)
  ),
  class = "data.frame",
  row.names = c(NA,-3L))