I am currently looking at something similar to df what I would like to be able to do is produce soemthing that looks like df2. Where the specified column values are compared next to eachother, the number of specific occurences are counted, and the count is places into a new column in a new dataframe.
For example: in df the combination 1, 5, and 9 occur 3 times.
df <- data.frame( col1 = c(1,2,3,4,1,2,3,4,1),
col2 = c(5,6,7,8,5,6,7,8,5),
col3 = c(9,10,11,12,9,10,11,13,9))
df2 <- data.frame( col1 = c(1,2,3,4,4),
col2 = c(5,6,7,8,8),
col3 = c(9,10,11,12,13),
count = c(3,2,2,1,1))
I tried using dplyr
df2 <- df %>%
distinct(col1,col2, col3) %>%
group_by(col3) %>%
summarize("count" = n())
with no success
CodePudding user response:
library(dplyr)
df %>%
count(col1,col2,col3)
col1 col2 col3 n
1 1 5 9 3
2 2 6 10 2
3 3 7 11 2
4 4 8 12 1
5 4 8 13 1
CodePudding user response:
Is using plyr fine?
library(plyr)
ddply(df,.(col1,col2,col3),nrow)
Output:
col1 col2 col3 V1
1 1 5 9 3
2 2 6 10 2
3 3 7 11 2
4 4 8 12 1
5 4 8 13 1
CodePudding user response:
The best way to do it with dplyr is using count() as suggested by Vinícius Félix's response
However, here is a fix using the syntax you started. You were thinking in the right direction.
Library
library(dplyr)
Solution to your code
df %>%
# distinct(col1,col2, col3) # you don't need this row, remove it.
group_by(col1, col2, col3) %>% # you have to group by all columns you want to check
summarize(count = n()) %>% # quotes are not needed, but are not wrong
ungroup() # Always add ungroup() at the end to solve future problems
Output
#> # A tibble: 5 × 4
#> col1 col2 col3 count
#> <dbl> <dbl> <dbl> <int>
#> 1 1 5 9 3
#> 2 2 6 10 2
#> 3 3 7 11 2
#> 4 4 8 12 1
#> 5 4 8 13 1
Created on 2022-12-03 with reprex v2.0.2
