Subset in R with specific values for specific columns identified by their index number-CodePudding

If I have a data frame like this:

df = data.frame(A = sample(1:5, 10, replace=T), B = sample(1:5, 10, replace=T), C = sample(1:5, 10, replace=T), D = sample(1:5, 10, replace=T), E = sample(1:5, 10, replace=T))

Giving me this:

   A B C D E
1  1 5 1 4 3
2  2 3 5 4 3
3  4 2 2 4 4
4  2 1 2 5 2
5  3 3 4 4 5
6  3 2 3 1 5
7  1 5 4 2 3
8  1 3 5 5 1
9  3 1 1 3 5
10 5 3 1 2 4

How do I get a subset that includes all the rows where the values for certain columns (B and D, say) are equal to 1, with the columns identified by their index numbers (2 and 4) rather than their names? In this case:

   A B C D E
4  2 1 2 5 2
6  3 2 3 1 5
9  3 1 1 3 5

CodePudding user response：

df[rowSums(df[c(2,4)] == 1) > 0,]
#   A B C D E
# 4 2 1 2 5 2
# 6 3 2 3 1 5
# 9 3 1 1 3 5

You said to compare values by column index, so df[c(2,4)] or (or df[,c(2,4)]).
df[c(2,4)] == 1 returns a matrix of logicals, whether the cell's value is equal to 1.
rowSums(.) > 0 finds those rows with at least one 1.
df[rowSums(.)>0,] selects just those rows.

Data

df <- structure(list(A = c(1L, 2L, 4L, 2L, 3L, 3L, 1L, 1L, 3L, 5L), B = c(5L, 3L, 2L, 1L, 3L, 2L, 5L, 3L, 1L, 3L), C = c(1L, 5L, 2L, 2L, 4L, 3L, 4L, 5L, 1L, 1L), D = c(4L, 4L, 4L, 5L, 4L, 1L, 2L, 5L, 3L, 2L), E = c(3L, 3L, 4L, 2L, 5L, 5L, 3L, 1L, 5L, 4L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))

CodePudding user response：

tidyverse

df <-
  structure(
    list(
      A = c(1L, 2L, 4L, 2L, 3L, 3L, 1L, 1L, 3L, 5L),
      B = c(5L, 3L, 2L, 1L, 3L, 2L, 5L, 3L, 1L, 3L),
      C = c(1L, 5L, 2L, 2L, 4L, 3L, 4L, 5L, 1L, 1L),
      D = c(4L, 4L, 4L, 5L, 4L, 1L, 2L, 5L, 3L, 2L),
      E = c(3L, 3L, 4L, 2L, 5L, 5L, 3L, 1L, 5L, 4L)
    ),
    class = "data.frame",
    row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10")
  )

library(tidyverse)
df %>% 
  filter(B == 1 | D == 1)
#>   A B C D E
#> 4 2 1 2 5 2
#> 6 3 2 3 1 5
#> 9 3 1 1 3 5

^{Created on 2022-01-23 by the reprex package (v2.0.1)}

data.table

library(data.table)

setDT(df)[B == 1 | D == 1, ]
#>    A B C D E
#> 1: 2 1 2 5 2
#> 2: 3 2 3 1 5
#> 3: 3 1 1 3 5

^{Created on 2022-01-23 by the reprex package (v2.0.1)}