Home > Net >  Flagging an id when having similar columns different values in R
Flagging an id when having similar columns different values in R

Time:01-12

I need to flag an id when they have different grade values in the grade columns. Here how my sample dataset looks like

df <- data.frame(id = c(11,22,33,44,55),
                 grade.1 = c(3,4,5,6,7),
                 grade.2 = c(3,4,5,NA,7),
                 grade.3 = c(4,4,6,5,7),
                 grade.4 = c(NA,NA,NA, 5, 7 ))

df$Grade <- paste0(df$grade.1, df$grade.2, df$grade.3, df$grade.4)

> df
  id grade.1 grade.2 grade.3 grade.4 Grade
1 11       3       3       4      NA 334NA
2 22       4       4       4      NA 444NA
3 33       5       5       6      NA 556NA
4 44       6      NA       5       5 6NA55
5 55       7       7       7       7  7777

When an id has different grade values in grade.1 grade.2 grade.3 and grade.4, that row needs to be flagged. Having NA in that column does not affect the flagging.

In other words, if the Grade column at the end has any differential numbers, that id needs to be flagged.

My desired output should look like this:

> df
  id grade.1 grade.2 grade.3 grade.4        flag
1 11       3       3       4      NA     flagged
2 22       4       4       4      NA Not_flagged
3 33       5       5       6      NA     flagged
4 44       6      NA       5       5     flagged
5 55       7       7       7       7 Not_flagged

Any ideas? Thanks!

CodePudding user response:

Here is a base R approach.

df$flag <- c("not_flagged", "flagged")[
  apply(df[-1L], 1L, \(x) length( (ux <- unique(x))[!is.na(ux)] ) > 1L)   1L
]

Output

> df
  id grade.1 grade.2 grade.3 grade.4        flag
1 11       3       3       4      NA     flagged
2 22       4       4       4      NA not_flagged
3 33       5       5       6      NA     flagged
4 44       6      NA       5       5     flagged
5 55       7       7       7       7 not_flagged

CodePudding user response:

A possible solution:

library(tidyverse)

df <- data.frame(id = c(11,22,33,44,55),
                 grade.1 = c(3,4,5,6,7),
                 grade.2 = c(3,4,5,NA,7),
                 grade.3 = c(4,4,6,5,7),
                 grade.4 = c(NA,NA,NA, 5, 7 ))

df %>% 
  rowwise %>% 
  mutate(flag = if_else(length(unique(na.omit(c_across(2:5))))  == 1,
                        "not-flagged", "flagged")) %>% ungroup

#> # A tibble: 5 × 6
#>      id grade.1 grade.2 grade.3 grade.4 flag       
#>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>      
#> 1    11       3       3       4      NA flagged    
#> 2    22       4       4       4      NA not-flagged
#> 3    33       5       5       6      NA flagged    
#> 4    44       6      NA       5       5 flagged    
#> 5    55       7       7       7       7 not-flagged

Using data.table::uniqueN, that counts the number of unique elements in a vector (and that allows for NA removal):

library(data.table)
library(dplyr)

df %>% 
  rowwise %>% 
  mutate(flag = if_else(uniqueN(c_across(2:5), na.rm = T)  == 1,
                        "not-flagged", "flagged")) %>% ungroup

CodePudding user response:

A base R solution using rle omitting NA values.

df$flag <- apply(df[,2:5], 1, function(x) 
  ifelse(length(rle(x[!is.na(x)])$lengths)==1, "not_flagged", "flagged"))

df
  id grade.1 grade.2 grade.3 grade.4        flag
1 11       3       3       4      NA     flagged
2 22       4       4       4      NA not_flagged
3 33       5       5       6      NA     flagged
4 44       6      NA       5       5     flagged
5 55       7       7       7       7 not_flagged

Data

df <- structure(list(id = c(11, 22, 33, 44, 55), grade.1 = c(3, 4, 
5, 6, 7), grade.2 = c(3, 4, 5, NA, 7), grade.3 = c(4, 4, 6, 5, 
7), grade.4 = c(NA, NA, NA, 5, 7)), class = "data.frame", row.names = c(NA, 
-5L))

CodePudding user response:

n_distinct from dyplr is very helpful: Here a version using a combination of pivot_longer and pivot_wider:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(
    -c(id, Grade),
    names_to = "name",
    values_to = "value"
  ) %>% 
  group_by(id) %>% 
  mutate(flag = ifelse(n_distinct(value, na.rm = TRUE)==1, "Not flagged", "Flagged")) %>% 
  pivot_wider(
    names_from = name,
    values_from = value
  )
     id Grade flag        grade.1 grade.2 grade.3 grade.4
  <dbl> <chr> <chr>         <dbl>   <dbl>   <dbl>   <dbl>
1    11 334NA Flagged           3       3       4      NA
2    22 444NA Not flagged       4       4       4      NA
3    33 556NA Flagged           5       5       6      NA
4    44 6NA55 Flagged           6      NA       5       5
5    55 7777  Not flagged       7       7       7       7
  •  Tags:  
  • Related