I am trying to conditionally concatenate string variables using tidyverse.
Here is the toy data
df <- tibble(id = paste0("id_", 1:4),
outcome = rep(x = c("simon",
"garfunkel"),
times = 2),
worth = rep(x = c("awesome",
"disposable"),
times = 2))
df
# id outcome worth
# <chr> <chr> <chr>
# 1 id_1 simon awesome
# 2 id_2 garfunkel disposable
# 3 id_3 simon awesome
# 4 id_4 garfunkel disposable
I can use unite() from tidyr to combine the id column and 'worth' column like so
df %>%
unite("id", c(id, worth))
# id outcome
# <chr> <chr>
# 1 id_1_awesome simon
# 2 id_2_disposable garfunkel
# 3 id_3_awesome simon
# 4 id_4_disposable garfunkel
But there are a few problems with this, some problems with the output and some problems with the way I generated it.
First, I would like to retain the original column whereas unite() simply concatenates the two columns. I tried unite within mutate but this generated an error.
Second, and most important, rather than simply concatenating a column I would like to make the new cocantenated id column a combination of the id column and the worth column but conditional on the outcome column. I tried to do this using case_when() within mutate() but got confused where to put the paste0() function and/or whether unite() could be used inside case_when().
Third, and related to the second point, I need to concatenate only a part of the worth column into the id column. ideally using a regex substitution, capturing only the first x letters of the worth column
Basically I need the new dataset to look like the dataframe below, but using conditional and string-concantenation mechanics
tibble(id = paste0(paste0("id_", 1:4),
rep(c("_awes", "_disp"))),
outcome = rep(x = c("simon",
"garfunkel"),
times = 2),
worth = rep(x = c("awesome",
"disposable"),
times = 2))
# id outcome worth
# <chr> <chr> <chr>
# 1 id_1_awes simon awesome
# 2 id_2_disp garfunkel disposable
# 3 id_3_awes simon awesome
# 4 id_4_disp garfunkel disposable
Any help much appreciated.
(p.s. apologies if you think Garfunkel was also awesome)
CodePudding user response:
df %>%
mutate(worth1 = substr(worth, 1, 4)) %>%
unite(id, id, worth1)
# A tibble: 4 x 3
id outcome worth
<chr> <chr> <chr>
1 id_1_awes simon awesome
2 id_2_disp garfunkel disposable
3 id_3_awes simon awesome
4 id_4_disp garfunkel disposable
CodePudding user response:
I put up a very confusing example, which, as @camille pointed out, had some redundancy in that the column I wanted to condition on followed an identical pattern to the column I wanted to extract, hence removing the need for conditioning at all. All I can say is mea culpa. However, since people have already provided solutions based on the original, confusing dataset I will leave the example as-is. Based on their answers the following is what I was looking for
df %>%
mutate(newid = case_when(outcome == "simon" ~ paste(id, substr(worth, 1, 4), sep = "_"),
outcome == "garfunkel" ~ paste(id, substr(worth, 1, 4), sep = "_")))
# id outcome worth newid
# <chr> <chr> <chr> <chr>
# 1 id_1 simon awesome id_1_awes
# 2 id_2 garfunkel disposable id_2_disp
# 3 id_3 simon awesome id_3_awes
# 4 id_4 garfunkel disposable id_4_disp
This solution conditions on the outcome variable but extracts the first four characters of the worth variable and combines that with the `id variable. Thanks to the responders for helping me with this.
