I have a dataframe which has a colon of such values
test1=data.frame(c("ABC 01; 02; 03", "test2 01; 02; 03"))
I would like to insert text before the semicolon, like this:
test1=data.frame(c("ABC 01; ABC 02; ABC 03", "test2 01; test2 02; test2 03"))
can someone show me how to do this? thank you!!
CodePudding user response:
Using only base R:
test1$y <- mapply(
\(org, key) gsub("; ([0-9] )", key, org),
org = test1$x, key = sprintf("; %s \\1", sub(" . ", "", test1$x))
)
x y
1 ABC 01; 02; 03 ABC 01; ABC 02; ABC 03
2 test2 01; 02; 03 test2 01; test2 02; test2 03
Data
test1 <- data.frame(x = c("ABC 01; 02; 03", "test2 01; 02; 03"))
CodePudding user response:
Here's a two-step tidyverse solution:
library(tidyverse)
test1 %>%
mutate(
# create temporary variable containing text string:
temp = str_replace(var, "(\\w ).*", " \\1"),
# add text string each time there is ";" to the left:
var= str_replace_all(var, "(?<=;)", temp)) %>%
# remove `temp`:
select(-temp)
var
1 ABC 01; ABC 02; ABC 03
2 test2 01; test2 02; test2 03
How this works:
-
- using
str_replacewe define the string-initial alphanumeric substring (\\w) as a capture group (by placing it into parentheses) and refer to it, and it alone, in the replacement clause using backreference (\\1), where we also add one whitespace (before the backreference)
- using
-
- next, using
str_replace_allwe add the text string intempto the strings invaron the condition that there be a literal;immediately to the left (this type of conditional matching is called positive look-behind; its syntax is(?<= ...))
- next, using
Data:
test1=data.frame(var = c("ABC 01; 02; 03", "test2 01; 02; 03"))
CodePudding user response:
Another regex option could be to parse it all in capture groups:
fun <- \(x) gsub("(\\w ) (\\d ); (\\d ); (\\d )", "\\1 \\2; \\1 \\3; \\1 \\4", x)
Then with either dplyr or base:
library(dplyr)
test1 |>
mutate(result = fun(string))
test1$result <- sapply(test1$string, fun)
Output:
string result
1 ABC 01; 02; 03 ABC 01; ABC 02; ABC 03
2 test2 01; 02; 03 test2 01; test2 02; test2 03
Data:
test1 <- data.frame(string = c("ABC 01; 02; 03", "test2 01; 02; 03"))
CodePudding user response:
Using strsplit and paste. Split on space then paste 1st item to all items excluding 1st item:
test1$new <- sapply(strsplit(test1$x, " ", fixed = TRUE),
function(i) paste(paste(i[ 1 ], i[ -1 ]), collapse = " "))
test1
# x new
# 1 ABC 01; 02; 03 ABC 01; ABC 02; ABC 03
# 2 test2 01; 02; 03 test2 01; test2 02; test2 03
CodePudding user response:
Here is an option using stringr functions.
library(dplyr)
library(stringr)
test1 = data.frame(col = c("ABC 01; 02; 03", "test2 01; 02; 03"))
result <- test1 %>%
mutate(common = str_extract(col, '\\w '),
parts = str_split(str_remove(col, common), ';\\s '),
new_string = purrr::map2_chr(common, parts,
str_c, sep = " ", collapse = ";"))
result
# col common parts new_string
#1 ABC 01; 02; 03 ABC 01, 02, 03 ABC 01;ABC 02;ABC 03
#2 test2 01; 02; 03 test2 01, 02, 03 test2 01;test2 02;test2 03
result$new_string
#[1] "ABC 01;ABC 02;ABC 03" "test2 01;test2 02;test2 03"
You may drop the columns that you don't need from result.
