Home > Net >  How to extract a string of unknown length between two delimiters in R
How to extract a string of unknown length between two delimiters in R

Time:01-28

I have a data frame containing a column with user's email addresses. The format of the email address could be anything. I need to create a new column called 'agency' with just the domain of the user's email (in other words, extract the value between '@' and the last '.').

Example:

I don't seem to be able to tackle the syntax to get there...

So far the best I could do was to eliminate the part before @:

Azure_table <- Azure_table %>%
                  mutate(
                    agency = gsub(".*@", "", userPrincipalName)
                  )

Which gives me the following result: output

How do I eliminate the text after the last dot (.com, .ca, etc)? Is there a better way of doing this?

Thanks in advance!

CodePudding user response:

The following along with str_extract should suit your needs. Instead of replacing text with an empty string, I just extracted the desired information.

pattern = "(?<=@).*(?=\\.[a-zA-Z] $)"

Test cases:

s1 <- "[email protected]"
s2 <- "[email protected]"
s3 = "[email protected]"
s4 <- "[email protected]"


str_extract(s1, pattern)
[1] "subtel"
str_extract(s2, pattern)
[1] "subtel"
str_extract(s3, pattern)
[1] "hello.something"
str_extract(s4, pattern)
[1] "example.applestore.apple"
  •  Tags:  
  • Related