I would like to rewrite the following regular expression in r by using [:alnum:], that in my understanding should do the same thing:
starwars %>% mutate(name = str_replace_all(name, "[^a-zA-Z\\d\\s:\u00C0-\u00FF]", ""))
But the behaviour I get is not at all what I expected:
starwars %>% mutate(name = str_replace_all(name, "[^:alnum:]", ""))
By the way, I need to remove the underscores _ and the all the spaces.
CodePudding user response:
You can use
library(stringr)
str_replace_all(name, "[^[:alnum:]] ", "")
## or
str_replace_all(name, "[:^alnum:] ", "")
The [^[:alnum:]] pattern is a negated bracket expression ([^...]) that matches any chars other than letters and digits ([:alnum:], a POSIX character class).
The [:^alnum:] pattern is an extension of the POSIX character class with an inverse meaning.
The is a quantifier, it matches one or more occurrences of the pattern it quantifies.
Also, in stringr, the shorthand character classes are Unicode aware, so you may also use
str_replace_all(name, "[\\W_] ", "")
where \W matches any char other than Unicode letters, digits or underscores, and _ matches underscores.
