Home > Back-end >  Filter vector elements containing and not containing multiple strings
Filter vector elements containing and not containing multiple strings

Time:01-09

Based on code from this link, we could find file names containing multiple strings:

allpatterns <- function(fnames, patterns) {
  i <- sapply(fnames, function(fn) all(sapply(patterns, grepl, fn)) )
  fnames[i]
}

filenames <- c("foo.txt", "bar.R", "foo_quux.py", "quux.c", "quux.foo",
               "foo_bar", "bar.foo.cpp", "foo_bar_quux", "quux_foo.bar", "nothing")

allpatterns(filenames, c("foo", "bar"))
# [1] "foo_bar"      "bar.foo.cpp"  "foo_bar_quux" "quux_foo.bar"

Now I'd like to go further by adding a condition not contain certain strings, for example I hope to filter file names which containing foo, bar and not containing cpp, quux, which will gives following result:

 # [1] "foo_bar"

How could I achieve that by modifying code above?

EDIT: answer below dedicated to a R master, it's inspiring even I did not get an exact expected result with it:

filenames <- c("foo.txt", "bar.R", "foo_quux.py", "quux.c", "quux.foo",
               "foo_bar", "bar.foo.cpp", "foo_bar_quux", "quux_foo.bar",
               "nothing")
keep <- c("foo", "bar")
drop <- c("cpp", "quux")

paste0('', paste0(keep, collapse = ''))
keep_regex <- paste0("\\b(?:", paste(keep, collapse="|"), ")\\b")
drop_regex <- paste0("\\b(?:", paste(drop, collapse="|"), ")\\b")

result <- filenames[grepl(keep_regex, filenames) &
                      !grepl(drop_regex, filenames)]
result

CodePudding user response:

"foo" OR "bar" without "cpp" and "quux":

filenames[grepl("foo|bar",filenames)&!grepl("cpp|quux",filenames)]
[1] "foo.txt" "bar.R"   "foo_bar"

"foo" AND "bar" without "cpp" and "quux":

filenames[grepl("(?=.*foo)(?=.*bar)",filenames,perl = T)&!grepl("cpp|quux",filenames)]
[1] "foo_bar"

CodePudding user response:

Maybe this function would be of help:

allpatterns <- function(fnames, keep, remove) {
  # Include if it contains all the `keep` variables
  i <- Reduce(`&`, lapply(keep, function(x) grepl(x, fnames)))
  # Drop if any of `remove` variable is present. 
  j <- !Reduce(`|`, lapply(remove, function(x) grepl(x, fnames)))
  fnames[i & j]
}

allpatterns(filenames, c("foo", "bar"), c("cpp", "quux"))
#[1] "foo_bar"
  •  Tags:  
  • Related