R function to count duplicates of two variables in data frame-CodePudding

I have a data frame like this:

Gender	Income	Years_Edu
m	3428	5
f	8976	6
m	2000	2
m	3428	5
f	8976	6
.	....	..

I would like to create a new table with only unquie tuples for all three variable with an additional column for the number of duplicates.

Gender	Income	Years_Edu	Count
m	3428	5	2
f	8976	6	2
m	2000	2	1
.	....	..	..

Does somebody have a tip to achieve this?

Thanks for your help and please let me know, if you need more info.

CodePudding user response：

Well, you could do this not very fancy (but functional) solution:

Gender<-c("m","f","m","m","f")
Income<-c(3428,8976,2000,3428,8976)
Years_Edu<-c(5,6,2,5,6)
df<-as.data.frame(cbind(Gender,Income,Years_Edu))

df$combo<-paste0(df$Gender,df$Income,df$Years_Edu)

df %>% group_by(combo) %>% summarise(n=n())

CodePudding user response：

You can use group_by() and summarize() from the dplyr package:

library(dplyr)
Gender <- c("m","f","m","m","f")
Income <- c(3428,8976,2000,3428,8976)
Years_Edu <- c(5,6,2,5,6)
your_data <- as.data.frame(cbind(Gender,Income,Years_Edu))
your_data %>%
  group_by(Gender, Income, Years_Edu) %>%
  summarize(Count = n())

# A tibble: 3 x 4
# Groups:   Gender, Income [3]
  Gender Income Years_Edu Count
  <chr>  <chr>  <chr>     <int>
1 f      8976   6             2
2 m      2000   2             1
3 m      3428   5             2