I have a dataframe with ICPM codes before and after recoding of an operation.
df1 <- tibble::tribble(~ops, ~opsalt,
"8-915, 5-847.32", "5-847.32, 5-852.f3, 8-915",
"8-915, 5-781.30, 8-919, 5-807.4, 5-800.c1, 5-79b.81", "5-79b.81, 5-800.c1, 5-805.y, 5-807.4, 8-919, 5-781.30, 8-915",
"5-786.1, 5-808.a4, 5-784.1u, 5-783.2d, 5-788.5e", "5-788.5e, 5-783.2d, 5-780.4d, 5-784.7d, 5-784.1u, 5-808.a4, 5-786.1",
"8-915, 5-784.0v, 5-788.5f, 5-788.40, 5-808.b0, 5-786.k, 5-788.60, 5-788.00, 5-786.0, 5-783.2d", "5-788.00, 5-788.60, 5-786.0, 5-786.k, 5-788.40, 5-808.b0, 5-788.5f, 5-781.ad, 5-784.0v, 8-915")
I want to calculate two columns which contains the differing codes between the two columns.
For the first row the difference between ops and opsalt would be character(0).
The difference between opsalt and ops would be 5-852.f3.
Tried:
df <– df %>% mutate(ops = strsplit(ops,",")) %>%
mutate(opsalt =strsplit(opsalt,","))
df <- df %>% rowwise() %>% mutate(neu_alt = list(setdiff(ops,opsalt))) %>% mutate(alt_neu = list(setdiff(opsalt,ops)))
This didn't work, because I want to compare parts of the respective strings and not the whole string.
CodePudding user response:
It should work if you use ", " in strsplit and df1 in your first mutate call.
library(dplyr)
df1 %>%
mutate(across(.fns = ~ strsplit(.x, ", "))) %>%
rowwise %>%
mutate(neu_alt = list(setdiff(ops, opsalt)),
alt_neu = list(setdiff(opsalt, ops)))
#> # A tibble: 4 x 4
#> # Rowwise:
#> ops opsalt neu_alt alt_neu
#> <list> <list> <list> <list>
#> 1 <chr [2]> <chr [3]> <chr [0]> <chr [1]>
#> 2 <chr [6]> <chr [7]> <chr [0]> <chr [1]>
#> 3 <chr [5]> <chr [7]> <chr [0]> <chr [2]>
#> 4 <chr [10]> <chr [10]> <chr [1]> <chr [1]>
Created on 2022-01-04 by the reprex package (v0.3.0)
CodePudding user response:
If you want to keep them as strings, you can try this method. If you intend to do similar ops repeatedly, then I suggest retaining the list-columns (instead of repeatedly strspliting them).
df1 %>%
mutate(
d = mapply(function(...) toString(setdiff(...)),
strsplit(ops, "[ ,] "), strsplit(opsalt, "[ ,] "))
)
# # A tibble: 4 x 3
# ops opsalt d
# <chr> <chr> <chr>
# 1 8-915, 5-847.32 5-847.32, 5-852.f3, 8-915 ""
# 2 8-915, 5-781.30, 8-919, 5-807.4, 5-800.c1, 5-79b.81 5-79b.81, 5-800.c1, 5-805.y, 5-807.4, 8-919, 5-781.30, 8-915 ""
# 3 5-786.1, 5-808.a4, 5-784.1u, 5-783.2d, 5-788.5e 5-788.5e, 5-783.2d, 5-780.4d, 5-784.7d, 5-784.1u, 5-808.a4, 5-786.1 ""
# 4 8-915, 5-784.0v, 5-788.5f, 5-788.40, 5-808.b0, 5-786.k, 5-788.60, 5-788.00, 5-786.0, 5-783.2d 5-788.00, 5-788.60, 5-786.0, 5-786.k, 5-788.40, 5-808.b0, 5-788.5f, 5-781.ad, 5-784.0v, 8-915 "5-783.2d"
(I recommend using list-columns, though, as demonstrated in TimTeaFan's answer.)
