I have a data frame like this:
| Gender | Income | Years_Edu |
|---|---|---|
| m | 3428 | 5 |
| f | 8976 | 6 |
| m | 2000 | 2 |
| m | 3428 | 5 |
| f | 8976 | 6 |
| . | .... | .. |
I would like to create a new table with only unquie tuples for all three variable with an additional column for the number of duplicates.
| Gender | Income | Years_Edu | Count |
|---|---|---|---|
| m | 3428 | 5 | 2 |
| f | 8976 | 6 | 2 |
| m | 2000 | 2 | 1 |
| . | .... | .. | .. |
Does somebody have a tip to achieve this?
Thanks for your help and please let me know, if you need more info.
CodePudding user response:
Well, you could do this not very fancy (but functional) solution:
Gender<-c("m","f","m","m","f")
Income<-c(3428,8976,2000,3428,8976)
Years_Edu<-c(5,6,2,5,6)
df<-as.data.frame(cbind(Gender,Income,Years_Edu))
df$combo<-paste0(df$Gender,df$Income,df$Years_Edu)
df %>% group_by(combo) %>% summarise(n=n())
CodePudding user response:
You can use group_by() and summarize() from the dplyr package:
library(dplyr)
Gender <- c("m","f","m","m","f")
Income <- c(3428,8976,2000,3428,8976)
Years_Edu <- c(5,6,2,5,6)
your_data <- as.data.frame(cbind(Gender,Income,Years_Edu))
your_data %>%
group_by(Gender, Income, Years_Edu) %>%
summarize(Count = n())
# A tibble: 3 x 4
# Groups: Gender, Income [3]
Gender Income Years_Edu Count
<chr> <chr> <chr> <int>
1 f 8976 6 2
2 m 2000 2 1
3 m 3428 5 2
