How compare and count the number of unique values across multiple columns-CodePudding

I am currently looking at something similar to df what I would like to be able to do is produce soemthing that looks like df2. Where the specified column values are compared next to eachother, the number of specific occurences are counted, and the count is places into a new column in a new dataframe.

For example: in df the combination 1, 5, and 9 occur 3 times.

df <- data.frame( col1 = c(1,2,3,4,1,2,3,4,1),
                  col2 = c(5,6,7,8,5,6,7,8,5),
                  col3 = c(9,10,11,12,9,10,11,13,9))

df2 <- data.frame( col1 = c(1,2,3,4,4),
                   col2 = c(5,6,7,8,8),
                   col3 = c(9,10,11,12,13),
                   count = c(3,2,2,1,1))

I tried using dplyr

df2 <- df %>%
  distinct(col1,col2, col3) %>%
  group_by(col3) %>%
  summarize("count" = n())

with no success

CodePudding user response：

library(dplyr)

df %>% 
  count(col1,col2,col3)

  col1 col2 col3 n
1    1    5    9 3
2    2    6   10 2
3    3    7   11 2
4    4    8   12 1
5    4    8   13 1

CodePudding user response：

Is using plyr fine?

library(plyr)
ddply(df,.(col1,col2,col3),nrow)

Output:

  col1 col2 col3 V1
1    1    5    9  3
2    2    6   10  2
3    3    7   11  2
4    4    8   12  1
5    4    8   13  1

CodePudding user response：

The best way to do it with dplyr is using count() as suggested by Vinícius Félix's response

However, here is a fix using the syntax you started. You were thinking in the right direction.

Library

library(dplyr)

Solution to your code

df %>%
#  distinct(col1,col2, col3) # you don't need this row, remove it.
  group_by(col1, col2, col3) %>%  # you have to group by all columns you want to check
  summarize(count = n()) %>% # quotes are not needed, but are not wrong
  ungroup()  # Always add ungroup() at the end to solve future problems

Output


#> # A tibble: 5 × 4
#>    col1  col2  col3 count
#>   <dbl> <dbl> <dbl> <int>
#> 1     1     5     9     3
#> 2     2     6    10     2
#> 3     3     7    11     2
#> 4     4     8    12     1
#> 5     4     8    13     1

^{Created on 2022-12-03 with reprex v2.0.2}