I have a string str from which multiple substrings are to be extracted.
str <- "Nucleotide transport and metabolism,Secondary metabolites biosynthesis, transport, and catabolism / Chromatin structure and dynamics,Coenzyme metabolism,"
The conditions for extraction are:
- Extract everything till the first occurrence of a
,only if the next character is a capital letter - If the character next to a
,is not a capital letter, then proceed till- the next occurrence of
,which is followed by a capital letter OR - the occurrence of
/OR - the end of string
- the next occurrence of
The output should look like this
>output
[1] "Nucleotide transport and metabolism" "Secondary metabolites biosynthesis, transport, and catabolism"
[3] "Chromatin structure and dynamics" "Coenzyme metabolism"
CodePudding user response:
You can use str_split from the stringr package.
library(stringr)
str_split(str, ",(?=[:upper:])|\\s\\/\\s") %>% unlist() %>% gsub(",$", "", .)
[1] "Nucleotide transport and metabolism"
[2] "Secondary metabolites biosynthesis, transport, and catabolism"
[3] "Chromatin structure and dynamics"
[4] "Coenzyme metabolism,"
