Home > Enterprise >  Monthly to Daily Value - R
Monthly to Daily Value - R

Time:01-14

I have a data.frame df that has monthly data:

Company Store Brand Month Sales Budget Quantity Year
A Store A Brand A Jun $300 $300 3000 2022
A Store A Brand A Jul $300 $300 3000 2022
A Store A Brand A Aug $300 $300 3000 2022
A Store A Brand A Sep $300 $300 3000 2022

I want there to be the mean value for every day, for example (Jun has 30 days so, Sales $300 / 30 days = $10 per day):

Company Store Brand Month Sales Budget Quantity Date
A Store A Brand A Jun $10 $10 100 01-06-2022
A Store A Brand A Jun $10 $10 100 02-06-2022
A Store A Brand A Jun $10 $10 100 03-06-2022
A Store A Brand A Jun $10 $10 100 04-06-2022
A Store A Brand A Jun $10 $10 100 05-06-2022
A Store A Brand A Jun $10 $10 100 06-06-2022
A Store A Brand A Jun $10 $10 100 07-06-2022

I dont have any idea what function o code could use for that.

Thanks!

CodePudding user response:

A few things up front:

  • I'm inferring your locale is set up for Spanish (based on "Ago" for what I'm assuming is August). To run this code, I'm first setting up my local locale so that it'll parse correctly. You likely don't need this, but others (in other languages) might need this or similar in order to test this code.

    prevlocale <- Sys.getlocale("LC_TIME")
    Sys.setlocale("LC_TIME", "Spanish")
    # [1] "Spanish_Spain.1252"
    format(as.Date("2022-08-01"), format = "%b")
    # [1] "ago"
    ### when done and you want to return to your local locale
    Sys.setlocale("LC_TIME", prevlocale)
    
  • Your number columns are not numbers. I'll change them to numeric since otherwise math ops won't work, you have the option to reformat them into $-strings as needed.

  • We cannot do this in easy one-step logic for all rows since each month has different numbers of days. To move forward, the first challenge is to determine how many days are in each month. There are several ways to approach this (including the lubridate package), I'll offer a base-R solution (using as.POSIXlt) to figure it out and return the correct vector of dates.

    yrmon2days <- function(yr, mon) {
      stopifnot(length(yr) == 1L, length(mon) == 1L)
      day2 <- day1 <- as.POSIXlt(as.Date(paste(yr, mon, "01", sep = "-"), format = "%Y-%b-%d"))
      day2$mon <- day2$mon   1L
      seq(day1, day2-1, by = "day")
    }
    yrmon2days(2022, "Feb")
    #  [1] "2022-02-01 UTC" "2022-02-02 UTC" "2022-02-03 UTC" "2022-02-04 UTC" "2022-02-05 UTC" "2022-02-06 UTC" "2022-02-07 UTC"
    #  [8] "2022-02-08 UTC" "2022-02-09 UTC" "2022-02-10 UTC" "2022-02-11 UTC" "2022-02-12 UTC" "2022-02-13 UTC" "2022-02-14 UTC"
    # [15] "2022-02-15 UTC" "2022-02-16 UTC" "2022-02-17 UTC" "2022-02-18 UTC" "2022-02-19 UTC" "2022-02-20 UTC" "2022-02-21 UTC"
    # [22] "2022-02-22 UTC" "2022-02-23 UTC" "2022-02-24 UTC" "2022-02-25 UTC" "2022-02-26 UTC" "2022-02-27 UTC" "2022-02-28 UTC"
    

    The current is currently not vectorizable; it could be made so, but there are other complications with the surrounding data that made this step a little too far at the moment.

  • I tried going down the road of using dplyr::group_by and grouping in general, but while it made sense, I didn't want to assume one row per year/month. With that precaution, it became clear that we needed a row-wise operation, not something that might (though not with this data) return other than 1 row in each group.

dplyr

library(dplyr)
dat %>%
  mutate(across(c(Sales, Budget), ~ as.numeric(gsub("\\$", "", .)))) %>%
  rowwise() %>%
  summarize(
    Date = yrmon2days(Year, Month),
    Company, Store, Brand, Year, Month,
    across(c(where(is.numeric), -Year), ~ . / length(Date))
  )
# # A tibble: 122 x 9
#    Date                Company Store   Brand    Year Month Sales Budget Quantity
#    <dttm>              <chr>   <chr>   <chr>   <int> <chr> <dbl>  <dbl>    <dbl>
#  1 2022-06-01 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  2 2022-06-02 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  3 2022-06-03 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  4 2022-06-04 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  5 2022-06-05 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  6 2022-06-06 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  7 2022-06-07 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  8 2022-06-08 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
#  9 2022-06-09 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
# 10 2022-06-10 00:00:00 A       Store A Brand A  2022 Jun      10     10      100
# # ... with 112 more rows

base R

dat[c("Sales","Budget")] <- lapply(dat[c("Sales","Budget")], function(z) as.numeric(gsub("\\$", "", z,)))
isnum <- sapply(dat, is.numeric)
isnum[which(colnames(dat) == "Year")] <- FALSE
out <- do.call(rbind, lapply(seq_len(nrow(dat)), function(rn) {
  Date <- yrmon2days(dat$Year[rn], dat$Month[rn])
  Nums <- lapply(dat[rn,isnum], `/`, length(Date))
  suppressWarnings( # "row names were found from a short variable and have been discarded"
    cbind(dat[rn,!isnum], Nums, data.frame(Date = Date))
  )
}))
head(out)
#   Company   Store   Brand Month Year Sales Budget Quantity       Date
# 1       A Store A Brand A   Jun 2022    10     10      100 2022-06-01
# 2       A Store A Brand A   Jun 2022    10     10      100 2022-06-02
# 3       A Store A Brand A   Jun 2022    10     10      100 2022-06-03
# 4       A Store A Brand A   Jun 2022    10     10      100 2022-06-04
# 5       A Store A Brand A   Jun 2022    10     10      100 2022-06-05
# 6       A Store A Brand A   Jun 2022    10     10      100 2022-06-06

Data

dat <- structure(list(Company = c("A", "A", "A", "A"), Store = c("Store A", "Store A", "Store A", "Store A"), Brand = c("Brand A", "Brand A", "Brand A", "Brand A"), Month = c("Jun", "Jul", "Ago", "Sep"), Sales = c("$300", "$300", "$300", "$300"), Budget = c("$300", "$300", "$300", "$300"), Quantity = c(3000L, 3000L, 3000L, 3000L), Year = c(2022L, 2022L, 2022L, 2022L)), class = "data.frame", row.names = c(NA, -4L))
  •  Tags:  
  • Related