Lets say the data looks like this
A <- c("name1", "name2", "name3", "name1", "name1", "name4")
B <- c(10, 8, 7, 3, -1, -2)
C <- c(8, 3, -1, -10, -2, -2)
df <- data.frame(A, B, C)
df
A B C
1 name1 10 8
2 name2 8 3
3 name3 7 -1
4 name1 3 -10
5 name1 -1 -2
6 name6 -2 -2
Now there must be a smart way to "collect" ONLY the rows that have triplicated values for the first column (A) into a new dataframe. So for this particular example that would be all rows that have "name1" because that is repeated thrice. How to do this if the dataset is very large, how can you detect and keep rows with triplicated (or any other arbitrary number) of values?
CodePudding user response:
dplyr
df %>%
group_by(A) %>%
filter(n() == 3)
base R
df[A %in% names(which(table(df$A) == 3)),]
output
A B C
1 name1 10 8
2 name1 3 -10
3 name1 -1 -2
