Home > Software design >  R: Removing Text using gsub in loop
R: Removing Text using gsub in loop

Time:02-04

I'm trying to parse and clean up a string in a column named Tab in a dataframe named df3 in R. Here is my solution adopting the gsub() function in R.

df3$Tab <- gsub(".*from","",df3$Tab)
df3$Tab <- gsub(".*FROM","",df3$Tab)
df3$Tab <- gsub("where.*","",df3$Tab)
df3$Tab <- gsub("WHERE.*","",df3$Tab)

basically I'd like to remove all before and up to "FROM" as well as everything behind "WHERE". This solution works but I'd like to write down a loop, but when I do this:

df3$Tab <- for (i in seq_along(df3$Tab)){
  df3$Tab <- gsub(".*from","",df3$Tab)
  df3$Tab <- gsub(".*FROM","",df3$Tab)
  df3$Tab <- gsub("where.*","",df3$Tab)
  df3$Tab <- gsub("WHERE.*","",df3$Tab)
  break
}

I completely delete the df3$Tab column. Is the anybody that can explain me where I'm wrong? Thanks

CodePudding user response:

Wondering why you are using a loop at all. This one-liner should work as well:

df3$Tab <- gsub("(?i)(where.*|.*from)", "",df3$Tab)
  • (?i): a flag to make the regex case-insensitive
  • (where.*|.*from): an alternation group matching both (i) literal where followed by anything zero or more times and (ii) literal from preceded by anything zero or more times
  •  Tags:  
  • Related