R - Excluding repeating, chronological events in row counts-CodePudding

I would like to count distinct events that have a time element to them. When a certain event repeats chronologically, I only want to give credit to this event one time. In this data, the events are mobility events, so passive mobility events are assumed (right or wrong, not important) to be a continuous action and not distinct. This would only matter within a specific event_date For example:

Suppose I have this df

df <- data.frame(id = c("1", "1", "1", "1", "2", "2", "2", "2"),
                 event = c("sitting", "sitting", "sitting", "sitting", "sitting", "walking", "sitting", "sitting"),
                 timestamp = c('2021-12-19 11:30:00', '2021-12-19 12:30:00', '2021-12-19 13:30:00', '2021-12-20 14:30:00', '2021-12-20 12:00:00', '2021-12-20 12:30:00', '2021-12-20 13:30:00', '2021-12-20 14:30:00'))

df$timestamp <- as.POSIXct(df$timestamp)
df$event_date <- as.Date(df$timestamp, format = "%Y-%m-%d")

> df
  id   event           timestamp event_date
1  1 sitting 2021-12-19 11:30:00 2021-12-19
2  1 sitting 2021-12-19 12:30:00 2021-12-19
3  1 sitting 2021-12-19 13:30:00 2021-12-19
4  1 sitting 2021-12-19 14:30:00 2021-12-19
5  2 sitting 2021-12-20 12:00:00 2021-12-20
6  2 walking 2021-12-20 12:30:00 2021-12-20
7  2 sitting 2021-12-20 13:30:00 2021-12-20
8  2 sitting 2021-12-20 14:30:00 2021-12-20

I would like to achieve an output that looks like this:

> df
  id event_date mobility_events
1  1 2021-12-19               1
2  1 2021-12-20               1
3  2 2021-12-20               3

id 1 only gets credit for 1 mobility event the first event_date because it is sitting for 3 consecutive timestamps of the same activity on that event_date. id 2 gets credit for 3 events because the sitting event is interrupted by a different event; however, does not get credit for the last sitting event because it is chronological and uninterrupted by a different event.

I hope this is clear.

CodePudding user response：

You can use dplyr to simply group_by your id and date and then count the number of times event != lag(event).

library(dplyr)

df %>%
  group_by(id, event_date) %>%
  summarize(mobility_events = 1   sum(event != lag(event), na.rm = T),
            .groups = "drop")
#> # A tibble: 3 × 3
#>   id    event_date mobility_events
#>   <chr> <date>               <dbl>
#> 1 1     2021-12-19               1
#> 2 1     2021-12-20               1
#> 3 2     2021-12-20               3

CodePudding user response：

data

df <- data.frame(id = c("1", "1", "1", "1", "2", "2", "2", "2"),
                 event = c("sitting", "sitting", "sitting", "sitting", "sitting", "walking", "sitting", "sitting"),
                 timestamp = c('2021-12-19 11:30:00', '2021-12-19 12:30:00', '2021-12-19 13:30:00', '2021-12-20 14:30:00', '2021-12-20 12:00:00', '2021-12-20 12:30:00', '2021-12-20 13:30:00', '2021-12-20 14:30:00'))

df$timestamp <- as.POSIXct(df$timestamp)
df$event_date <- as.Date(df$timestamp, format = "%Y-%m-%d")

solution using data.table

library(data.table)

setDT(df)

df[, .(mobility_events = max(rleid(event))), by = .(id, event_date)]

output

   id event_date mobility_events
1:  1 2021-12-19               1
2:  1 2021-12-20               1
3:  2 2021-12-20               3