I need to flag an id when they have different grade values in the grade columns. Here how my sample dataset looks like
df <- data.frame(id = c(11,22,33,44,55),
grade.1 = c(3,4,5,6,7),
grade.2 = c(3,4,5,NA,7),
grade.3 = c(4,4,6,5,7),
grade.4 = c(NA,NA,NA, 5, 7 ))
df$Grade <- paste0(df$grade.1, df$grade.2, df$grade.3, df$grade.4)
> df
id grade.1 grade.2 grade.3 grade.4 Grade
1 11 3 3 4 NA 334NA
2 22 4 4 4 NA 444NA
3 33 5 5 6 NA 556NA
4 44 6 NA 5 5 6NA55
5 55 7 7 7 7 7777
When an id has different grade values in grade.1 grade.2 grade.3 and grade.4, that row needs to be flagged. Having NA in that column does not affect the flagging.
In other words, if the Grade column at the end has any differential numbers, that id needs to be flagged.
My desired output should look like this:
> df
id grade.1 grade.2 grade.3 grade.4 flag
1 11 3 3 4 NA flagged
2 22 4 4 4 NA Not_flagged
3 33 5 5 6 NA flagged
4 44 6 NA 5 5 flagged
5 55 7 7 7 7 Not_flagged
Any ideas? Thanks!
CodePudding user response:
Here is a base R approach.
df$flag <- c("not_flagged", "flagged")[
apply(df[-1L], 1L, \(x) length( (ux <- unique(x))[!is.na(ux)] ) > 1L) 1L
]
Output
> df
id grade.1 grade.2 grade.3 grade.4 flag
1 11 3 3 4 NA flagged
2 22 4 4 4 NA not_flagged
3 33 5 5 6 NA flagged
4 44 6 NA 5 5 flagged
5 55 7 7 7 7 not_flagged
CodePudding user response:
A possible solution:
library(tidyverse)
df <- data.frame(id = c(11,22,33,44,55),
grade.1 = c(3,4,5,6,7),
grade.2 = c(3,4,5,NA,7),
grade.3 = c(4,4,6,5,7),
grade.4 = c(NA,NA,NA, 5, 7 ))
df %>%
rowwise %>%
mutate(flag = if_else(length(unique(na.omit(c_across(2:5)))) == 1,
"not-flagged", "flagged")) %>% ungroup
#> # A tibble: 5 × 6
#> id grade.1 grade.2 grade.3 grade.4 flag
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 11 3 3 4 NA flagged
#> 2 22 4 4 4 NA not-flagged
#> 3 33 5 5 6 NA flagged
#> 4 44 6 NA 5 5 flagged
#> 5 55 7 7 7 7 not-flagged
Using data.table::uniqueN, that counts the number of unique elements in a vector (and that allows for NA removal):
library(data.table)
library(dplyr)
df %>%
rowwise %>%
mutate(flag = if_else(uniqueN(c_across(2:5), na.rm = T) == 1,
"not-flagged", "flagged")) %>% ungroup
CodePudding user response:
A base R solution using rle omitting NA values.
df$flag <- apply(df[,2:5], 1, function(x)
ifelse(length(rle(x[!is.na(x)])$lengths)==1, "not_flagged", "flagged"))
df
id grade.1 grade.2 grade.3 grade.4 flag
1 11 3 3 4 NA flagged
2 22 4 4 4 NA not_flagged
3 33 5 5 6 NA flagged
4 44 6 NA 5 5 flagged
5 55 7 7 7 7 not_flagged
Data
df <- structure(list(id = c(11, 22, 33, 44, 55), grade.1 = c(3, 4,
5, 6, 7), grade.2 = c(3, 4, 5, NA, 7), grade.3 = c(4, 4, 6, 5,
7), grade.4 = c(NA, NA, NA, 5, 7)), class = "data.frame", row.names = c(NA,
-5L))
CodePudding user response:
n_distinct from dyplr is very helpful: Here a version using a combination of pivot_longer and pivot_wider:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(
-c(id, Grade),
names_to = "name",
values_to = "value"
) %>%
group_by(id) %>%
mutate(flag = ifelse(n_distinct(value, na.rm = TRUE)==1, "Not flagged", "Flagged")) %>%
pivot_wider(
names_from = name,
values_from = value
)
id Grade flag grade.1 grade.2 grade.3 grade.4
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 11 334NA Flagged 3 3 4 NA
2 22 444NA Not flagged 4 4 4 NA
3 33 556NA Flagged 5 5 6 NA
4 44 6NA55 Flagged 6 NA 5 5
5 55 7777 Not flagged 7 7 7 7
