I have a data set (n=500) in R that looks like this
ID A C S
1 4 4 4
2 3 2 3
3 5 4 2
Id like to create a new variable(I am calling this variable "same") that tells me whether any of my columns have the same value (excluding my ID column). So,
ID A C S Same
1 4 4 4 all
2 3 2 3 as
3 5 4 2 none
4 7 7 2 ac
Any help would be much appreciated! I am pretty lost! Thank you!
CodePudding user response:
We may loop over the rows with apply (MARGIN = 1) with selected columns ([-1] without the 'ID' column), then check the length of unique elements, if it is 1, return 'all' or else paste the names of the duplicated elements. If there are no duplicates, then it returns blank "", change the blank to 'none'
df1$Same <- apply(df1[-1], 1, \(x) {
x1 <- if(length(unique(x)) == 1) 'all' else
paste(tolower(names(x))[duplicated(x)|duplicated(x,
fromLast = TRUE)], collapse = "")
x1[x1 == ""] <- "none"
x1})
-output
> df1
ID A C S Same
1 1 4 4 4 all
2 2 3 2 3 as
3 3 5 4 2 none
4 4 7 7 2 ac
data
df1 <- structure(list(ID = 1:4, A = c(4L, 3L, 5L, 7L), C = c(4L, 2L,
4L, 7L), S = c(4L, 3L, 2L, 2L)), class = "data.frame", row.names = c(NA,
-4L))
