Consider this table:
| City | degree_day |
|---|---|
| Asheville | 0 |
| Asheville | 15 |
| Asheville | 30 |
| Asheville | 40 |
| Asheville | 75 |
| Asheville | 90 |
| clayton | 0 |
| clayton | 20 |
| clayton | 30 |
| clayton | 80 |
| clayton | 100 |
| clayton | 120 |
I would like to remove all rows after the threshold has been reached. Lets say the threshold is 60. If I do the following code it will only give me values less than or equal to the threshold:
dat %>% group_by(City) %>% filter(degree_day <= 60)
| City | degree_day |
|---|---|
| Asheville | 0 |
| Asheville | 15 |
| Asheville | 30 |
| Asheville | 40 |
| clayton | 0 |
| clayton | 20 |
| clayton | 30 |
Whereas my desired output includes the first row after the filter threshold:
| City | degree_day |
|---|---|
| Asheville | 0 |
| Asheville | 15 |
| Asheville | 30 |
| Asheville | 40 |
| Asheville | 75 |
| clayton | 0 |
| clayton | 20 |
| clayton | 30 |
| clayton | 80 |
How can I achieve this result?
CodePudding user response:
My solution may not be very clever, see if someone else can have better idea.
I'll filter again in the other direction (i.e. degree_day > 60), and slice() the minimum row of that, which will be the next record after threshold. Then combine the two together.
library(tidyverse)
rbind(dat %>% filter(degree_day <= 60),
dat %>% filter(degree_day > 60) %>% group_by(City) %>% slice_min(degree_day)) %>%
arrange(City)
# A tibble: 9 x 2
City degree_day
<chr> <int>
1 Asheville 0
2 Asheville 15
3 Asheville 30
4 Asheville 40
5 Asheville 75
6 clayton 0
7 clayton 20
8 clayton 30
9 Clayton 80
CodePudding user response:
A solution via defining temporary subsets of dfand finally binding them back again:
df1 <- df %>% group_by(City) %>% filter(degree_day <= 60)
df2 <- df %>% group_by(City) %>% filter(degree_day > 60) %>% slice_head()
df3 <- bind_rows(df1, df2) %>% arrange(City)
# A tibble: 10 × 2
# Groups: City [2]
City degree_day
<chr> <dbl>
1 A 0
2 A 12
3 A 25
4 A 30
5 A 75
6 B 0
7 B 20
8 B 27
9 B 58
10 B 100
Data:
df <- data.frame(
City = c(rep("A", 7), rep("B", 7)),
degree_day = c(0,12,25,30,75,90,100,
0,20,27,58,100,120,150)
)
