Home > Mobile >  R function to count duplicates of two variables in data frame
R function to count duplicates of two variables in data frame

Time:01-19

I have a data frame like this:

Gender Income Years_Edu
m 3428 5
f 8976 6
m 2000 2
m 3428 5
f 8976 6
. .... ..

I would like to create a new table with only unquie tuples for all three variable with an additional column for the number of duplicates.

Gender Income Years_Edu Count
m 3428 5 2
f 8976 6 2
m 2000 2 1
. .... .. ..

Does somebody have a tip to achieve this?

Thanks for your help and please let me know, if you need more info.

CodePudding user response:

Well, you could do this not very fancy (but functional) solution:

Gender<-c("m","f","m","m","f")
Income<-c(3428,8976,2000,3428,8976)
Years_Edu<-c(5,6,2,5,6)
df<-as.data.frame(cbind(Gender,Income,Years_Edu))

df$combo<-paste0(df$Gender,df$Income,df$Years_Edu)

df %>% group_by(combo) %>% summarise(n=n())

CodePudding user response:

You can use group_by() and summarize() from the dplyr package:

library(dplyr)
Gender <- c("m","f","m","m","f")
Income <- c(3428,8976,2000,3428,8976)
Years_Edu <- c(5,6,2,5,6)
your_data <- as.data.frame(cbind(Gender,Income,Years_Edu))
your_data %>%
  group_by(Gender, Income, Years_Edu) %>%
  summarize(Count = n())

# A tibble: 3 x 4
# Groups:   Gender, Income [3]
  Gender Income Years_Edu Count
  <chr>  <chr>  <chr>     <int>
1 f      8976   6             2
2 m      2000   2             1
3 m      3428   5             2
  •  Tags:  
  • Related