Is there a way to use the lead function to figure out the first row that meets a condition?-CodePudding

Hi I have a dataframe as such,

df= structure(list(a = c(1, 3, 4, 6, 3, 2, 5, 1), b = c(1, 3, 4, 
                                                    2, 6, 7, 2, 6), c = c(6, 3, 6, 5, 3, 6, 5, 3), d = c(6, 2, 4, 
                                                                                                         5, 3, 7, 2, 6), e = c(1, 2, 4, 5, 6, 7, 6, 3), f = c(2, 3, 4, 
                                                                                                                                                              2, 2, 7, 5, 2)), .Names = c("a", "b", "c", "d", "e", "f"), row.names = c(NA, 
                                                                                                                                                                                                                                       8L), class = "data.frame")
df$total = apply ( df, 1,sum )
df$row = seq ( 1, nrow ( df ))

so the dataframe looks like this.

    > df
  a b c d e f total row
1 1 1 6 6 1 2    17   1
2 3 3 3 2 2 3    16   2
3 4 4 6 4 4 4    26   3
4 6 2 5 5 5 2    25   4
5 3 6 3 3 6 2    23   5
6 2 7 6 7 7 7    36   6
7 5 2 5 2 6 5    25   7
8 1 6 3 6 3 2    21   8

what I want to do is figure the first leading row where the total is greater than the current. For example for row 1 the total is 17 and the nearest leading row >= 17 would be row 3.

I could loop through each row but it gets really messy. Is this possible? thanks in advance.

CodePudding user response：

We can do this in 2 steps with dplyr. First we set grouping to rowwise, which applies the operation on each row (basically it makes it work like we were doing an apply loop through the rows), then we find all the rows where total is larger than that row's total. Then we drop those that come before the current row and pick the first (which is the next one):

library(dplyr)

df %>%
    rowwise() %>%
    mutate(nxt = list(which(.$total > total)),
           nxt = nxt[nxt > row][1])

# A tibble: 8 × 9
# Rowwise: 
      a     b     c     d     e     f total   row   nxt
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1     1     1     6     6     1     2    17     1     3
2     3     3     3     2     2     3    16     2     3
3     4     4     6     4     4     4    26     3     6
4     6     2     5     5     5     2    25     4     6
5     3     6     3     3     6     2    23     5     6
6     2     7     6     7     7     7    36     6    NA
7     5     2     5     2     6     5    25     7    NA
8     1     6     3     6     3     2    21     8    NA