Home > Mobile >  R: How to keep and filter out duplicates within rows
R: How to keep and filter out duplicates within rows

Time:01-07

Is it possible to keep and filter out duplicates within rows?

Here is dummy data:

a <- data.frame(c('a1', 'a1', 'a1', 'a2', 'a3', 'a3'),
                  c(1, 2, 3, 1, 2, 3),
                  stringsAsFactors = FALSE)
a

colnames(a) <- c('id', 'number')
a
#   id number
# 1 a1      1
# 2 a1      2
# 3 a1      3
# 4 a2      1
# 5 a3      2
# 6 a3      3

#'Expected Result

#   id number
# 1 a1      1
# 2 a1      2
# 3 a1      3
# 5 a3      2
# 6 a3      3

As you can see, Not duplicated rows are removed from the "id" variable.

And can we adjust filtering? For example: keep and filter 3 or more duplicates within the "id" variable.

Is it achievable? dplyr approach will be helpful.

Thank you.

CodePudding user response:

subset(a, duplicated(id)|duplicated(id, fromLast = TRUE))

  id number
1 a1      1
2 a1      2
3 a1      3
5 a3      2
6 a3      3

if you are using filter:

filter(a, duplicated(id)|duplicated(id, fromLast = TRUE))

or even:

a %>%
  group_by(id) %>%
  filter(n() > 1)
  •  Tags:  
  • Related