I'm trying to parse and clean up a string in a column named Tab in a dataframe named df3 in R. Here is my solution adopting the gsub() function in R.
df3$Tab <- gsub(".*from","",df3$Tab)
df3$Tab <- gsub(".*FROM","",df3$Tab)
df3$Tab <- gsub("where.*","",df3$Tab)
df3$Tab <- gsub("WHERE.*","",df3$Tab)
basically I'd like to remove all before and up to "FROM" as well as everything behind "WHERE". This solution works but I'd like to write down a loop, but when I do this:
df3$Tab <- for (i in seq_along(df3$Tab)){
df3$Tab <- gsub(".*from","",df3$Tab)
df3$Tab <- gsub(".*FROM","",df3$Tab)
df3$Tab <- gsub("where.*","",df3$Tab)
df3$Tab <- gsub("WHERE.*","",df3$Tab)
break
}
I completely delete the df3$Tab column. Is the anybody that can explain me where I'm wrong? Thanks
CodePudding user response:
Wondering why you are using a loop at all. This one-liner should work as well:
df3$Tab <- gsub("(?i)(where.*|.*from)", "",df3$Tab)
(?i): a flag to make the regex case-insensitive(where.*|.*from): an alternation group matching both (i) literalwherefollowed by anything zero or more times and (ii) literalfrompreceded by anything zero or more times
