stringr regex to replace commas which meet conditions-CodePudding

I'm stuck with a file where someone didn't escape commas inside labels.

Here's an example:

library(tidyverse)

t1 <- c("1,0.259524,0.594196,0.305349,$15,000 - $19,999,Unknown",
        "2,0.673729,0.249742,0.729358,Greater than $124,999,College")

The commas are used to separate columns, and but they're also showing up inside the dollars field.

I can match the commas which are my problem

t1 %>% 
  str_extract_all(
    "\\$\\d{2,3},\\d{3}"
    )

returns

[[1]]
[1] "$15,000" "$19,999"

[[2]]
[1] "$124,999"

How do I operate on each row, removing only the commas inside that label?

CodePudding user response：

You could use gsub to get rid of the commas:

t1 <- c("1,0.259524,0.594196,0.305349,$15,000 - $19,999,Unknown",
        "2,0.673729,0.249742,0.729358,Greater than $124,999,College")

gsub("(\\$\\d ),(\\d{3})", "\\1\\2", t1)
#> [1] "1,0.259524,0.594196,0.305349,$15000 - $19999,Unknown"     
#> [2] "2,0.673729,0.249742,0.729358,Greater than $124999,College"