I am using the following line of code in R to search for sentences that have the word "event" within them:
ind<-lapply(sents(tri_doc), function(ch) grep("event", ch))
I am interested in the root word "event" or "events" but this grep returns sentences where "prevent," "eventually," "eventuality," etc. all appear...the search is overly broad.
It seems like grep behaves a little differently within lapply().
I tried grep -w ("event", ch)), but this returned an error in R. I have also tried using escape characters for spaces " " or " \s" and also the posix for space to search for " event "...but these do not accomplish what I need (actually, they do not work or find anything.
How can I use grep to search only for the root word and not its longer forms? Thanks
CodePudding user response:
Based on
vec <- c("event", "prevent", "events", "eventuality")
Try:
word-boundary
grep("\\bevents?\\b", vec, value = TRUE) # [1] "event" "events"negative lookbehind (and a word-boundary):
grep("(?<![A-Za-z])events?\\b", vec, value = TRUE, perl = TRUE) # [1] "event" "events"
(I used grep(.., value=TRUE) solely for demonstration here, value= does nothing to whether it matches, just what it returns.)
