Home > Net >  Using grep to search for specific a root word within lapply
Using grep to search for specific a root word within lapply

Time:02-05

I am using the following line of code in R to search for sentences that have the word "event" within them:

ind<-lapply(sents(tri_doc), function(ch) grep("event", ch))

I am interested in the root word "event" or "events" but this grep returns sentences where "prevent," "eventually," "eventuality," etc. all appear...the search is overly broad.

It seems like grep behaves a little differently within lapply().

I tried grep -w ("event", ch)), but this returned an error in R. I have also tried using escape characters for spaces " " or " \s" and also the posix for space to search for " event "...but these do not accomplish what I need (actually, they do not work or find anything.

How can I use grep to search only for the root word and not its longer forms? Thanks

CodePudding user response:

Based on

vec <- c("event", "prevent", "events", "eventuality")

Try:

  • word-boundary

    grep("\\bevents?\\b", vec, value = TRUE)
    # [1] "event"  "events"
    
  • negative lookbehind (and a word-boundary):

    grep("(?<![A-Za-z])events?\\b", vec, value = TRUE, perl = TRUE)
    # [1] "event"  "events"
    

(I used grep(.., value=TRUE) solely for demonstration here, value= does nothing to whether it matches, just what it returns.)

  •  Tags:  
  • Related