I have a list of character vectors like this:
my_list <- list(c('a','b','c','d','e'),c('e','f','g'),c('h','i','j'))
names(my_list) <- c("group1","group2","group3")
And I want to have a simple way to test my_list for duplicates in the letters across any of the 3 groups/vectors in my list. So for instance, "e" appears in both group 1 and group 2 so that would be a duplicate. Anything simple that just returns a logical if there is at least one or more duplicates across 2 or more groups would be ideal. So a FALSE return would mean that the letters in each group are unique to that group only (this isn't the case in my example here obviously).
Thanks so much!
CodePudding user response:
A binary output can be generated with
any(duplicated(unlist(my_list)))
[1] TRUE
As pointed out correctly in comments by @sindri_baldur, if duplicates appear in groups they should be handled with unique, if desired:
any(duplicated(unlist(lapply(my_list, unique))))
[1] TRUE
or another base R alternative
anyDuplicated(unlist(lapply(my_list, unique))) > 1
[1] TRUE
CodePudding user response:
You could do:
subset(stack(my_list), duplicated(values))$values
[1] "e"
If you need to tell whether all the values in a group are unique to that group, you could do:
result <- setNames(logical(length(my_list)), names(my_list))
result[unique(unlist(Filter(\(x)length(x)>1,
unstack(rev(stack(my_list))))))] <- TRUE
result
group1 group2 group3
TRUE TRUE FALSE
or even:
stack(my_list) %>%
mutate(dups = duplicated(values) | duplicated(values, f = T)) %>%
group_by(ind) %>%
summarise(logic = any(dups))
# A tibble: 3 x 2
ind logic
<fct> <lgl>
1 group1 TRUE
2 group2 TRUE
3 group3 FALSE
CodePudding user response:
We can stack the named list to a two column data.frame, get the frequency count with table, check for duplicates by column with colSums on a logical vector and return with the names that are occuring more than 1
names(which(colSums(table(stack(my_list)[2:1])> 0) > 1))
[1] "e"
Or slighly more compact
names(which(table(unlist(my_list)) > 1))
[1] "e"
If we want a logical column
library(dplyr)
library(tidyr)
library(tibble)
enframe(my_list) %>%
unnest(value) %>%
group_by(value) %>%
mutate(flag = any(n_distinct(name) > 1)) %>%
group_by(name) %>%
summarise(flag = any(flag))
-output
# A tibble: 3 × 2
name flag
<chr> <lgl>
1 group1 TRUE
2 group2 TRUE
3 group3 FALSE
CodePudding user response:
Another possible solution, based on tidyr::expand_grid and purrr::pmap_lgl:
library(tidyverse)
my_list <- list(c('a','b','c','d','e'),c('e','f','g'),c('h','i','j'))
names(my_list) <- c("group1","group2","group3")
expandg <- expand_grid(names(my_list), names(my_list))
pmap_lgl(expandg, ~ any(my_list[[.x]] %in% my_list[[.y]])) %>%
bind_cols(id1 = expandg[[1]], id2 = expandg[[2]], value = .) %>%
group_by(Group = id1) %>% summarise(value = any(value[id1 != id2]))
#> # A tibble: 3 × 2
#> Group value
#> <chr> <lgl>
#> 1 group1 TRUE
#> 2 group2 TRUE
#> 3 group3 FALSE
