How do I Identify by row id the values in a data frame column not in another data frame column?-CodePudding

How do I identify by row id the values in data frame d2 column c3 that are not in data frame d1 column c1? My which function returns all records when sub-setting as shown. My requirement is to follow this sub set structure and not value$field design which works:

c1 <- c("A", "B", "C", "D", "E")
c2 <- c("a", "b", "c", "d", "e")

c3 <- c("A", "z", "C", "z", "E", "F")
c4 <- c("a", "x", "x", "d", "e", "f")

d1 <- data.frame(c1, c2, stringsAsFactors = F)
d2 <- data.frame(c3, c4, stringsAsFactors = F)

x <- unique(d1["c1"])
y <- d2[,"c3"]

id <- which(!(y %in% x) )  # incorrect, all row ids returned

I am trying to find the id's of rows in y where the specified column does not include values of x

CodePudding user response：

I believe setdiff would work here. I see z and F are what you want, right? They are not in d1[,"c1"] but are in d2[,"c3"]

includes <- setdiff(d2[,"c3"], d1[,"c1"])

d2_new <- d2[d2[,"c3"] %in% includes,]

d2_new$id <- rownames(d2_new)
d2_new

# or 

ids <- rownames(d2[d2[,"c3"] %in% includes,])

output

d2_new

#  c3 c4 id
#2  z  x  2
#4  z  d  4
#6  F  f  6

ids
#[1] "2" "4" "6"