Home > OS >  Find differences in character column in R
Find differences in character column in R

Time:01-05

I have a dataframe with ICPM codes before and after recoding of an operation.

    df1 <- tibble::tribble(~ops, ~opsalt,
"8-915, 5-847.32",      "5-847.32, 5-852.f3, 8-915",
"8-915, 5-781.30, 8-919, 5-807.4, 5-800.c1, 5-79b.81", "5-79b.81, 5-800.c1, 5-805.y, 5-807.4, 8-919, 5-781.30, 8-915",
"5-786.1, 5-808.a4, 5-784.1u, 5-783.2d, 5-788.5e", "5-788.5e, 5-783.2d, 5-780.4d, 5-784.7d, 5-784.1u, 5-808.a4, 5-786.1",
"8-915, 5-784.0v, 5-788.5f, 5-788.40, 5-808.b0, 5-786.k, 5-788.60, 5-788.00, 5-786.0, 5-783.2d", "5-788.00, 5-788.60, 5-786.0, 5-786.k, 5-788.40, 5-808.b0, 5-788.5f, 5-781.ad, 5-784.0v, 8-915")

I want to calculate two columns which contains the differing codes between the two columns.
For the first row the difference between ops and opsalt would be character(0).
The difference between opsalt and ops would be 5-852.f3.

Tried:

df <–  df %>% mutate(ops = strsplit(ops,",")) %>% 
        mutate(opsalt =strsplit(opsalt,","))    
df <- df %>% rowwise() %>%  mutate(neu_alt = list(setdiff(ops,opsalt))) %>% mutate(alt_neu = list(setdiff(opsalt,ops)))

This didn't work, because I want to compare parts of the respective strings and not the whole string.

CodePudding user response:

It should work if you use ", " in strsplit and df1 in your first mutate call.

library(dplyr)

df1 %>%
  mutate(across(.fns = ~ strsplit(.x, ", "))) %>% 
  rowwise %>% 
  mutate(neu_alt = list(setdiff(ops, opsalt)),
         alt_neu = list(setdiff(opsalt, ops)))

#> # A tibble: 4 x 4
#> # Rowwise: 
#>   ops        opsalt     neu_alt   alt_neu  
#>   <list>     <list>     <list>    <list>   
#> 1 <chr [2]>  <chr [3]>  <chr [0]> <chr [1]>
#> 2 <chr [6]>  <chr [7]>  <chr [0]> <chr [1]>
#> 3 <chr [5]>  <chr [7]>  <chr [0]> <chr [2]>
#> 4 <chr [10]> <chr [10]> <chr [1]> <chr [1]>

Created on 2022-01-04 by the reprex package (v0.3.0)

CodePudding user response:

If you want to keep them as strings, you can try this method. If you intend to do similar ops repeatedly, then I suggest retaining the list-columns (instead of repeatedly strspliting them).

df1 %>%
  mutate(
    d = mapply(function(...) toString(setdiff(...)),
               strsplit(ops, "[ ,] "), strsplit(opsalt, "[ ,] "))
  )
# # A tibble: 4 x 3
#   ops                                                                                           opsalt                                                                                        d         
#   <chr>                                                                                         <chr>                                                                                         <chr>     
# 1 8-915, 5-847.32                                                                               5-847.32, 5-852.f3, 8-915                                                                     ""        
# 2 8-915, 5-781.30, 8-919, 5-807.4, 5-800.c1, 5-79b.81                                           5-79b.81, 5-800.c1, 5-805.y, 5-807.4, 8-919, 5-781.30, 8-915                                  ""        
# 3 5-786.1, 5-808.a4, 5-784.1u, 5-783.2d, 5-788.5e                                               5-788.5e, 5-783.2d, 5-780.4d, 5-784.7d, 5-784.1u, 5-808.a4, 5-786.1                           ""        
# 4 8-915, 5-784.0v, 5-788.5f, 5-788.40, 5-808.b0, 5-786.k, 5-788.60, 5-788.00, 5-786.0, 5-783.2d 5-788.00, 5-788.60, 5-786.0, 5-786.k, 5-788.40, 5-808.b0, 5-788.5f, 5-781.ad, 5-784.0v, 8-915 "5-783.2d"

(I recommend using list-columns, though, as demonstrated in TimTeaFan's answer.)

  •  Tags:  
  • Related