Home > Net >  How to filter, remove or subset rows "certain range of values" between variables?
How to filter, remove or subset rows "certain range of values" between variables?

Time:01-13

Is it possible to filter, remove or subset rows "certain range of value" between variables?

Here is dummy data:

a <- data.frame(c('b1', 'b2', 'b3'),
                c(0.2, 1.5, 0.5),
                c(0.4, 1.0, 0.3),
                c(0.5, 0.5, 0.1),
                c(-0.5, -2.5, -0.2),
                c(-0.3, -3.0, -0.4),
                c(-0.5, -1.7, -0.4),
                stringsAsFactors = FALSE)

colnames(a) <- c('id', 'var1', 'var2', 'var3', 'var4', 'var5', 'var6')
rownames(a) <- a$id

a_subset <- a[, 2:7]
a_subset

#    var1 var2 var3 var4 var5 var6
# b1  0.2  0.4  0.5 -0.5 -0.3 -0.5
# b2  1.5  1.0  0.5 -2.5 -3.0 -1.7
# b3  0.5  0.3  0.1 -0.2 -0.4 -0.4


#'[ Here we can see in the b1 row between variables ranges are between -0.5 to 0.5 and total range is 1.0 between minimum and maximum values.]

#'[Expected output]

#'[For example: if we want to filter out rows with range 1 between variables, we will have below result, because b2 rows total range is 4.5 between maximum and minimum values.]


#    var1 var2 var3 var4 var5 var6
# b1  0.2  0.4  0.5 -0.5 -0.3 -0.5
# b3  0.5  0.3  0.1 -0.2 -0.4 -0.4

So is it possible to filter, subset, or remove rows based on the specific ranges between variables? any approach will be helpful. Thank you.

CodePudding user response:

base R

range <- apply(a_subset, 1, function(x) diff(range(x)))
a_subset[which(range <= 1),]

   var1 var2 var3 var4 var5 var6
b1  0.2  0.4  0.5 -0.5 -0.3 -0.5
b3  0.5  0.3  0.1 -0.2 -0.4 -0.4

tidyr

In tidyr, it is easier to work with tidy data:

a_subset %>% 
  rownames_to_column() %>% 
  pivot_longer(cols = -rowname) %>% 
  group_by(rowname) %>% 
  filter(diff(range(value)) <= 1) %>% 
  pivot_wider()
  •  Tags:  
  • Related