I have a data.frame of linear intervals, where each interval also has a numeric index:
df <- data.frame(id = c("id1","id20","id7","id12","id15"),
start = c(36, 41, 216, 234, 300),
end = c(21, 112, 263, 269, 340),
index = c(11, 12, 28, 29, 33))
Where df is sorted by index in ascending order.
I want to merge each set of rows whose indices are consecutive into a single row, such that their id is concatenated by a ;, start is the minimum start of the set, end is the maximum end of the set, and index is also the maximum index of the set.
So for the example above the resulting merged data.frame will be:
merged.df <- data.frame(id = c("id1;id20","id7;id12","id15"),
start = c(36, 216, 300),
end = c(112, 269, 340),
index = c(12, 29, 33))
Any idea?
CodePudding user response:
You could use cumsum(c(TRUE, diff(index) != 1)) to identify where the indices are consecutive.
library(dplyr)
df %>%
group_by(grp = cumsum(c(TRUE, diff(index) != 1))) %>%
summarise(id = paste(id, collapse = ";"),
start = min(start), end = max(end),
index = last(index)) %>%
select(-grp)
# # A tibble: 3 × 4
# id start end index
# <chr> <dbl> <dbl> <dbl>
# 1 id1;id20 36 112 12
# 2 id7;id12 216 269 29
# 3 id15 300 340 33
