I'm trying to fix some issues with data I have. The dataset is made up of a list of dataframes, each representing an individual. I have created a variable of interest for each individual (travel rate km/d). Owing to many different issues this travel rate can be inflated, and I need to work out why.
When the variable of interest is above a certain level I need to manually look at the row on which it sits as well as a few rows before that (e.g. 3 rows) to work out why it's so high. I don't think I can diagnose all the problems automatically (at least easily) as there are many too things that could influence the said variable.
Given the above, I have resigned myself to look at the rows for every individual that are suspect. Using the iris dataset as an example I can simply follow this:
library(tidyverse)
# creating similar list to what I have
list1 = iris %>% group_split(Species)
# returns each row which matches specific condition I'm interested in
lapply(list1, function(x) {x %>% filter(Petal.Length == 4.8)})
However, the above only returns the row on which the error occurs, whilst to work out why things are going wrong I the filter would ideally need to return row on which the condition is matched (where Petal.Length is 4.8 in the above example) but also a few rows before that. This would allow me to diagnose the issue.
What is the easiest way to achieve this, given that filter is only returning the specific row on which the condition is matched?
Thank you.
CodePudding user response:
You can try -
#3 rows before the condition matches
n <- 0:3
lapply(list1, function(x) {
inds <- which(x$Petal.Length == 4.8)
if(length(inds)) x[sort(unique(sapply(inds, `-`, n))), ]
})
which would return -
#[[1]]
#NULL
#[[2]]
# A tibble: 8 x 5
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fct>
#1 5.8 2.7 4.1 1 versicolor
#2 6.2 2.2 4.5 1.5 versicolor
#3 5.6 2.5 3.9 1.1 versicolor
#4 5.9 3.2 4.8 1.8 versicolor
#5 6.1 2.8 4.7 1.2 versicolor
#6 6.4 2.9 4.3 1.3 versicolor
#7 6.6 3 4.4 1.4 versicolor
#8 6.8 2.8 4.8 1.4 versicolor
#[[3]]
# A tibble: 8 x 5
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fct>
#1 6.3 2.7 4.9 1.8 virginica
#2 6.7 3.3 5.7 2.1 virginica
#3 7.2 3.2 6 1.8 virginica
#4 6.2 2.8 4.8 1.8 virginica
#5 7.7 3 6.1 2.3 virginica
#6 6.3 3.4 5.6 2.4 virginica
#7 6.4 3.1 5.5 1.8 virginica
#8 6 3 4.8 1.8 virginica
