I have a data.frame df that has monthly data:
| Company | Store | Brand | Month | Sales | Budget | Quantity | Year |
|---|---|---|---|---|---|---|---|
| A | Store A | Brand A | Jun | $300 | $300 | 3000 | 2022 |
| A | Store A | Brand A | Jul | $300 | $300 | 3000 | 2022 |
| A | Store A | Brand A | Aug | $300 | $300 | 3000 | 2022 |
| A | Store A | Brand A | Sep | $300 | $300 | 3000 | 2022 |
I want there to be the mean value for every day, for example (Jun has 30 days so, Sales $300 / 30 days = $10 per day):
| Company | Store | Brand | Month | Sales | Budget | Quantity | Date |
|---|---|---|---|---|---|---|---|
| A | Store A | Brand A | Jun | $10 | $10 | 100 | 01-06-2022 |
| A | Store A | Brand A | Jun | $10 | $10 | 100 | 02-06-2022 |
| A | Store A | Brand A | Jun | $10 | $10 | 100 | 03-06-2022 |
| A | Store A | Brand A | Jun | $10 | $10 | 100 | 04-06-2022 |
| A | Store A | Brand A | Jun | $10 | $10 | 100 | 05-06-2022 |
| A | Store A | Brand A | Jun | $10 | $10 | 100 | 06-06-2022 |
| A | Store A | Brand A | Jun | $10 | $10 | 100 | 07-06-2022 |
I dont have any idea what function o code could use for that.
Thanks!
CodePudding user response:
A few things up front:
I'm inferring your locale is set up for Spanish (based on
"Ago"for what I'm assuming is August). To run this code, I'm first setting up my local locale so that it'll parse correctly. You likely don't need this, but others (in other languages) might need this or similar in order to test this code.prevlocale <- Sys.getlocale("LC_TIME") Sys.setlocale("LC_TIME", "Spanish") # [1] "Spanish_Spain.1252" format(as.Date("2022-08-01"), format = "%b") # [1] "ago" ### when done and you want to return to your local locale Sys.setlocale("LC_TIME", prevlocale)Your number columns are not numbers. I'll change them to
numericsince otherwise math ops won't work, you have the option to reformat them into$-strings as needed.We cannot do this in easy one-step logic for all rows since each month has different numbers of days. To move forward, the first challenge is to determine how many days are in each month. There are several ways to approach this (including the
lubridatepackage), I'll offer a base-R solution (usingas.POSIXlt) to figure it out and return the correct vector of dates.yrmon2days <- function(yr, mon) { stopifnot(length(yr) == 1L, length(mon) == 1L) day2 <- day1 <- as.POSIXlt(as.Date(paste(yr, mon, "01", sep = "-"), format = "%Y-%b-%d")) day2$mon <- day2$mon 1L seq(day1, day2-1, by = "day") } yrmon2days(2022, "Feb") # [1] "2022-02-01 UTC" "2022-02-02 UTC" "2022-02-03 UTC" "2022-02-04 UTC" "2022-02-05 UTC" "2022-02-06 UTC" "2022-02-07 UTC" # [8] "2022-02-08 UTC" "2022-02-09 UTC" "2022-02-10 UTC" "2022-02-11 UTC" "2022-02-12 UTC" "2022-02-13 UTC" "2022-02-14 UTC" # [15] "2022-02-15 UTC" "2022-02-16 UTC" "2022-02-17 UTC" "2022-02-18 UTC" "2022-02-19 UTC" "2022-02-20 UTC" "2022-02-21 UTC" # [22] "2022-02-22 UTC" "2022-02-23 UTC" "2022-02-24 UTC" "2022-02-25 UTC" "2022-02-26 UTC" "2022-02-27 UTC" "2022-02-28 UTC"The current is currently not vectorizable; it could be made so, but there are other complications with the surrounding data that made this step a little too far at the moment.
I tried going down the road of using
dplyr::group_byand grouping in general, but while it made sense, I didn't want to assume one row per year/month. With that precaution, it became clear that we needed a row-wise operation, not something that might (though not with this data) return other than 1 row in each group.
dplyr
library(dplyr)
dat %>%
mutate(across(c(Sales, Budget), ~ as.numeric(gsub("\\$", "", .)))) %>%
rowwise() %>%
summarize(
Date = yrmon2days(Year, Month),
Company, Store, Brand, Year, Month,
across(c(where(is.numeric), -Year), ~ . / length(Date))
)
# # A tibble: 122 x 9
# Date Company Store Brand Year Month Sales Budget Quantity
# <dttm> <chr> <chr> <chr> <int> <chr> <dbl> <dbl> <dbl>
# 1 2022-06-01 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 2 2022-06-02 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 3 2022-06-03 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 4 2022-06-04 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 5 2022-06-05 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 6 2022-06-06 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 7 2022-06-07 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 8 2022-06-08 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 9 2022-06-09 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# 10 2022-06-10 00:00:00 A Store A Brand A 2022 Jun 10 10 100
# # ... with 112 more rows
base R
dat[c("Sales","Budget")] <- lapply(dat[c("Sales","Budget")], function(z) as.numeric(gsub("\\$", "", z,)))
isnum <- sapply(dat, is.numeric)
isnum[which(colnames(dat) == "Year")] <- FALSE
out <- do.call(rbind, lapply(seq_len(nrow(dat)), function(rn) {
Date <- yrmon2days(dat$Year[rn], dat$Month[rn])
Nums <- lapply(dat[rn,isnum], `/`, length(Date))
suppressWarnings( # "row names were found from a short variable and have been discarded"
cbind(dat[rn,!isnum], Nums, data.frame(Date = Date))
)
}))
head(out)
# Company Store Brand Month Year Sales Budget Quantity Date
# 1 A Store A Brand A Jun 2022 10 10 100 2022-06-01
# 2 A Store A Brand A Jun 2022 10 10 100 2022-06-02
# 3 A Store A Brand A Jun 2022 10 10 100 2022-06-03
# 4 A Store A Brand A Jun 2022 10 10 100 2022-06-04
# 5 A Store A Brand A Jun 2022 10 10 100 2022-06-05
# 6 A Store A Brand A Jun 2022 10 10 100 2022-06-06
Data
dat <- structure(list(Company = c("A", "A", "A", "A"), Store = c("Store A", "Store A", "Store A", "Store A"), Brand = c("Brand A", "Brand A", "Brand A", "Brand A"), Month = c("Jun", "Jul", "Ago", "Sep"), Sales = c("$300", "$300", "$300", "$300"), Budget = c("$300", "$300", "$300", "$300"), Quantity = c(3000L, 3000L, 3000L, 3000L), Year = c(2022L, 2022L, 2022L, 2022L)), class = "data.frame", row.names = c(NA, -4L))
