Home > Back-end >  Determining Consecutive Days Using Pandas
Determining Consecutive Days Using Pandas

Time:01-31

I have a Pandas DataFrame that looks something like this:

                                                       activity_id                            start                              end type  ...                                        site  heart_rate  grp_idx        date
user_id                                                                                                                                    ...
72xyy89c74cc57178e02f103187ad579  dcb12345678b5c8e84cf2931b1a553cb 2015-09-12 00:40:33.171000 00:00 2015-09-12 00:53:33.171000 00:00  run  ...  data\dcb12345678b5c8e84cf2931b1a553cb.json         NaN  2015-37  2015-09-12

The dtypes are:

activity_id                 object
start          datetime64[ns, UTC]
end            datetime64[ns, UTC]
type                        object
distance                   float64
steps                      float64
speed                      float64
pace                       float64
calories                   float64
ascent                     float64
descent                    float64
site                        object
heart_rate                 float64
grp_idx                     object
date                        object

I need to determine if there are x (e.g. 4) consecutive days in a row and find the number of times that has occurred. For example:

2015-09-12
2015-09-13
2015-09-14
2015-09-15
2015-09-16
2015-09-17
2015-09-18
2015-09-19

Would count as two.

I have tried using groupby, e.g.:

s = repeat_runner_df.groupby('user_id').start.diff().dt.days.fillna(1).cumsum()
repeat_runner_df.groupby(['user_id', s]).filter(lambda x: len(x) < 3)
print(repeat_runner_df)

but that has gotten me nowhere and neither has my Google skills. Any help would be greatly appreciated.

CodePudding user response:

Try this:

count = df.groupby(df.start.diff().ne(pd.Timedelta(days=1)).cumsum()).apply(len).ge(4).sum()

Output:

>>> count
3

CodePudding user response:

I think this is what youre looking for:

df.loc[df.groupby(['user_id',df.groupby('user_id')['start'].transform(lambda x: x.diff().dt.days.ne(1).cumsum())])['user_id'].transform('count').ge(4)]
  •  Tags:  
  • Related