Home > Back-end >  Rolling Average number of visitors by Location ID and Date using dplyr in R
Rolling Average number of visitors by Location ID and Date using dplyr in R

Time:01-09

I have a DataFrame df which has three columns:

  1. store_ID
  2. Date
  3. Number_of_Customers

For each store ID, there is one row for every date in a consecutive date range, and the number of customers visiting the store on that date.

For each store_ID I want to calculate the 5 day rolling average number of customers visiting that store.

I have tried things like

 df<- df%>%group_by(store_ID)%>%
    mutate(rollavg=rollmean(Number_of_Customers, 5, na.pad = TRUE))

This seems like it should work but does not seem to give me the correct values in the rollavg column.

What am I missing?

CodePudding user response:

I'm assuming you're using zoo::rollmean(). The problem is that you are calculating a moving average - that is, the average is centered on each date. You want it to be right aligned. This is controlled by the parameter align, whose default value in this function is "center".

You could read more about this by running ?rollapply, which is the underlying function used by rollmean().

library(tidyverse)
library(zoo)
df = df %>% 
  arrange(store_ID, Date) %>% 
  group_by(store_ID) %>%
  mutate(rollavg = rollmean(Number_of_Customers, 5, align = "right", fill = NA))

You may have already done this, but you should ensure the data is arranged by Date. I've added it above for completeness. Also note that na.pad is deprecated and should be replaced by fill = NA in this case.

  •  Tags:  
  • Related