I have a dataset of survey responses similar to this:
toy <- data.frame(v1 = c(1,2,3), v2 = c(1,6,3), v3 = c(1,2,4), v4 = c(1,7,3))
toy
v1 v2 v3 v4
1 1 1 1 1
2 2 6 2 7
3 3 3 4 3
I want to detect "straightlining" by finding the most common value for each row and calculating the proportion of columns with that value.
Two examples:
- if the value of every column in a row is 5, then the new variable should return 1
- If the value of 60% of the columns in a row is 3 and 40% of the columns is 4, then the variable should return .6
Desired output:
v1 v2 v3 v4 straightline_pct
1 1 1 1 1 1
2 2 6 2 7 .50
3 3 3 4 3 .75
CodePudding user response:
One base approach:
toy <- data.frame(v1 = c(1,2,3), v2 = c(1,6,3), v3 = c(1,2,4), v4 = c(1,7,3))
toy$straightline_pct = apply(as.matrix(toy),
1L,
function (x) max(prop.table(table(x)))
)
toy
#> v1 v2 v3 v4 straightline_pct
#> 1 1 1 1 1 1.00
#> 2 2 6 2 7 0.50
#> 3 3 3 4 3 0.75
CodePudding user response:
Slight variation with just table
toy$straightline_pct <- apply(toy, 1, function(x) max(table(x))/length(x) )
toy
v1 v2 v3 v4 straightline_pct
1 1 1 1 1 1.00
2 2 6 2 7 0.50
3 3 3 4 3 0.75
CodePudding user response:
A possible solution:
library(tidyverse)
toy <- data.frame(v1 = c(1,2,3), v2 = c(1,6,3), v3 = c(1,2,4), v4 = c(1,7,3))
toy %>%
rowwise %>%
mutate(perc = table(c_across(everything())) %>%
{(ncol(toy) - length(.) 1) / ncol(toy)}) %>% ungroup
#> # A tibble: 3 × 5
#> v1 v2 v3 v4 perc
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 1 1
#> 2 2 6 2 7 0.5
#> 3 3 3 4 3 0.75
An alternative, based on ave function:
toy %>%
rowwise %>%
mutate(perc = c_across(1:4) %>%
{max(ave(., ., FUN=length)) / ncol(toy)}) %>% ungroup
CodePudding user response:
I like @Paul Smith's and @Cole's answers better, but for completeness here's a more verbose approach:
library(tidyverse)
toy %>%
bind_cols(toy %>%
mutate(row = row_number()) %>%
pivot_longer(-row) %>%
count(row, value) %>%
group_by(row) %>%
mutate(straightline_pct = n / sum(n)) %>%
slice_max(straightline_pct) %>%
ungroup() %>%
select(straightline_pct)
)
CodePudding user response:
Here is a simple and verbose solution that is largely similar to other answers already:
library(tidyverse)
toy %>%
rowwise() %>%
mutate(
straightline_pct = max(table(c_across(everything()))) / ncol(.)
) %>%
ungroup()
# A tibble: 3 x 5
v1 v2 v3 v4 straightline_pct
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1
2 2 6 2 7 0.5
3 3 3 4 3 0.75
