Home > Blockchain >  Count number of values per time period
Count number of values per time period

Time:01-28

I have a dataframe like:

Month id_product
2020_05 0_0
2020_06 0_22
2020_06 0_22
2021_01 2_52
2023_02 0_0
2023_02 0_12

I would like to get the amount of each product sold each month, like:

Month 0_0 0_12 0_22 2_52
2020_05 1 0 0 0
2020_06 0 0 2 0
2021_01 0 0 0 0
2023_02 1 1 0 0

So I can find the best/worst sellers per month, quarters, years, etc.

I have tried several methods (aggregate, etc.), but I did not find the way to do it with the 70 000 transactions and 3 000 id products of my database. Maybe I am not taking the issue by the right side. I am looking for the right strategy.

CodePudding user response:

in Base R:

a <- table(df)
b <- cbind(Month = rownames(a), as.data.frame.matrix(a))
rownames(b) <- NULL
b

    Month 0_0 0_12 0_22 2_52
1 2020_05   1    0    0    0
2 2020_06   0    0    2    0
3 2021_01   0    0    0    1
4 2023_02   1    1    0    0

CodePudding user response:

df <- structure(list(Month = c("2020_05", "2020_06", "2020_06", "2021_01", 
"2023_02", "2023_02"), id_product = c("0_0", "0_22", "0_22", 
"2_52", "0_0", "0_12")), row.names = c(NA, -6L), class = "data.frame")

library(data.table)
setDT(df)

dcast(df, Month ~ id_product, fun.aggregate = length)
#> Key: <Month>
#>      Month   0_0  0_12  0_22  2_52
#>     <char> <int> <int> <int> <int>
#> 1: 2020_05     1     0     0     0
#> 2: 2020_06     0     0     2     0
#> 3: 2021_01     0     0     0     1
#> 4: 2023_02     1     1     0     0

Created on 2022-01-27 by the reprex package (v2.0.1)

CodePudding user response:

library(data.table)

dummy data

df <- data.table(month = c('2020_01', '2020_01', '2020_01', '2021_02', '2021_02', '2021_02')
             , id_product = c(1,2,2,1,1,2)
             ); df

     month id_product
1: 2020_01          1
2: 2020_01          2
3: 2020_01          2
4: 2021_02          1
5: 2021_02          1
6: 2021_02          2

summary - each product's quantity sold per month

x <- df[, .(q = .N), .(month, id_product)][]

     month id_product q
1: 2020_01          1 1
2: 2020_01          2 2
3: 2021_02          1 2
4: 2021_02          2 1

max q sold - best selling product per month

x[x[, .I[q == max(q)], .(month)]$V1]

     month id_product q
1: 2020_01          2 2
2: 2021_02          1 2
  •  Tags:  
  • Related