Home > Back-end >  R string split on parentheses, keeping the parentheses in the split with its content
R string split on parentheses, keeping the parentheses in the split with its content

Time:02-02

I am trying to split strings of a format

x <- "A(B)C"

where A, B and C could be empty strings or any sets of characters except for parentheses. The parentheses are always there - I want to keep them around the characters they enclose, so that the result would be:

"A" "(B)" "C"

So far my best try was:

strsplit(x, "(?<=\\))|(?=\\()", perl = TRUE)
[[1]]
[1] "A"  "("  "B)" "C"

but that keeps the opening parenthesis separate. Any ideas?

CodePudding user response:

You can use

x <- "A(B)C"
library(stringr)
str_extract_all(x, "\\([^()]*\\)|[^()] ")

See the R demo and the regex demo. Details:

  • \([^()]*\) - a (, zero or more chars other than ( and ) and then )
  • | - or
  • [^()] - one or more chars other than ( and ).

CodePudding user response:

library(stringr)

x <- c("A(B)C", "ABC", "0$b")
stringr::str_extract_all(x, "[\\(]?.{1}[\\)]?")

# [[1]]
# [1] "A"   "(B)" "C"  
# 
# [[2]]
# [1] "A" "B" "C"
# 
# [[3]]
# [1] "0" "$" "b"
  •  Tags:  
  • Related