I have a column with values such as this:
structure(list(col1 = c(" | | | | | | | |", "| | | | | | | | | | | | | | |",
"| | | | | | | | | | | | | | | ", "stop|", "stop| | ",
"stop | go")), class = "data.frame", row.names = c(NA, -6L))
I want to be able to remove all iterations of | when they show up consecutively, or if they show up as | | or | | |.
Currently, I'm trying to figure out all the iterations of the pipes, but they seem kind of random. I was wondering if there's a way to make sure my iterations cover the following instances:
- When there are more than one
|consecutively - When there are more than one
|consecutively with a number of spaces (e.g.,| |or| | | - When
|is at the end of the line (e.g.,\\|$
I would, however, keep the pipe between stop | go.
Here's the code that I'm working with right now, but it removes the pipe in stop | go.
df$col1 <- gsub('[\\| ]{2,}|[\\|$]', '', df$col1)
I want to remove all the | symbols except for the one in stop | go.
CodePudding user response:
Maybe this works
trimws(trimws(gsub('(\\|\\s ){2,}', "", df$col1),
whitespace = "\\s \\|"), whitespace = "\\|")
-output
[1] "" "" "" "stop" "stop" "stop | go"
CodePudding user response:
You could do:
gsub('\\|\\s*\\||\\|\\s*$', '', df$col1)
#> [1] " " " "
#> [3] " " "stop"
#> [5] "stop " "stop | go"
And a simple trimws if you don't want the spaces this leaves behind, as in akrun's answer:
trimws(gsub('\\|\\s*\\||\\|\\s*$', '', df$col1))
#> [1] "" "" "" "stop" "stop"
#> [6] "stop | go"
CodePudding user response:
Another regex strategy is to remove |'s not followed by space and word:
trimws(gsub("\\|(?!\\s\\w)", "", df$col1, perl = TRUE))
Output:
[1] "" "" "" "stop" "stop" "stop | go"
