Home > Software design >  For each row, identify the proportion of columns that have the same value in R
For each row, identify the proportion of columns that have the same value in R

Time:01-08

I have a dataset of survey responses similar to this:

toy <- data.frame(v1 = c(1,2,3), v2 = c(1,6,3), v3 = c(1,2,4), v4 = c(1,7,3))

toy
  v1 v2 v3 v4
1  1  1  1  1
2  2  6  2  7
3  3  3  4  3

I want to detect "straightlining" by finding the most common value for each row and calculating the proportion of columns with that value.

Two examples:

  • if the value of every column in a row is 5, then the new variable should return 1
  • If the value of 60% of the columns in a row is 3 and 40% of the columns is 4, then the variable should return .6

Desired output:

  v1 v2 v3 v4 straightline_pct
1  1  1  1  1   1
2  2  6  2  7 .50
3  3  3  4  3 .75

CodePudding user response:

One base approach:

toy <- data.frame(v1 = c(1,2,3), v2 = c(1,6,3), v3 = c(1,2,4), v4 = c(1,7,3))

toy$straightline_pct = apply(as.matrix(toy),
                             1L,
                             function (x) max(prop.table(table(x)))
                             )

toy
#>   v1 v2 v3 v4 straightline_pct
#> 1  1  1  1  1             1.00
#> 2  2  6  2  7             0.50
#> 3  3  3  4  3             0.75

CodePudding user response:

Slight variation with just table

toy$straightline_pct <- apply(toy, 1, function(x) max(table(x))/length(x) )
toy
  v1 v2 v3 v4 straightline_pct
1  1  1  1  1             1.00
2  2  6  2  7             0.50
3  3  3  4  3             0.75

CodePudding user response:

A possible solution:

library(tidyverse)

toy <- data.frame(v1 = c(1,2,3), v2 = c(1,6,3), v3 = c(1,2,4), v4 = c(1,7,3))

toy %>% 
  rowwise %>% 
  mutate(perc = table(c_across(everything())) %>% 
      {(ncol(toy) - length(.)   1) / ncol(toy)}) %>% ungroup

#> # A tibble: 3 × 5
#>      v1    v2    v3    v4  perc
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     1     1     1  1   
#> 2     2     6     2     7  0.5 
#> 3     3     3     4     3  0.75

An alternative, based on ave function:

toy %>%
  rowwise %>% 
  mutate(perc = c_across(1:4) %>%
    {max(ave(., ., FUN=length)) / ncol(toy)}) %>% ungroup

CodePudding user response:

I like @Paul Smith's and @Cole's answers better, but for completeness here's a more verbose approach:

library(tidyverse)
toy %>%
  bind_cols(toy %>%
      mutate(row = row_number()) %>%
      pivot_longer(-row) %>%
      count(row, value) %>%
      group_by(row) %>%
      mutate(straightline_pct = n / sum(n)) %>%
      slice_max(straightline_pct) %>%
      ungroup() %>%
      select(straightline_pct)
  )

CodePudding user response:

Here is a simple and verbose solution that is largely similar to other answers already:

library(tidyverse)
toy %>%
  rowwise() %>%
  mutate(
    straightline_pct = max(table(c_across(everything()))) / ncol(.)
  ) %>%
  ungroup()
# A tibble: 3 x 5
     v1    v2    v3    v4 straightline_pct
  <dbl> <dbl> <dbl> <dbl>            <dbl>
1     1     1     1     1             1   
2     2     6     2     7             0.5 
3     3     3     4     3             0.75
  •  Tags:  
  • Related