Home > Software design >  SQL Query with Total and Monthly breakdowns
SQL Query with Total and Monthly breakdowns

Time:02-04

How do I write a proper SQL query in order to display the below output. The revenue as a percentage rounded up to the fourth decimal. Ex. If Sales person has generated $24K in total (2021)

I am using pgAdmin 4 (v4.12) and each time I try, I get either total of Monthly but not Yearly and vice versa.

This is my attempted query:

SELECT Person_id
     , MONTH(Date_of_Sales) AS month
     , YEAR(Date_of_Sales) AS year 
     , ROUND((Total_Revenue)/SUM(Total_Revenue),3) AS pct_rev
     , SUM(Total_Revenue) AS monthly_revenue
     , SUM(Total_Revenue) AS yearly_revenue 
WHERE Date_of_Sales >= '2021-01-01' and Date_of_Sales <='2021-12-31' 
GROUP BY Month(Date_of_Sales) 
ORDER BY Month(Date_of_Sales);

Expected Output

Person_id MONTH % Revenue Monthly Revenue Yearly Revenue
65464654 2021-01 0.1667 2000 12000
65464654 2021-02 0.2500 3000 12000
65464654 2021-03 0.1667 2000 12000
65464655 2021-01 0.0208 500 24000
65464655 2021-03 0.1667 4000 24000

NOTE: % Revenue is the percentage of the Monthly Revenue with respect to the Yearly Revenue for that Sales Person

  • should be rounded to 4 decimal places!

Schema

CREATE TABLE Sales (
  Person_id BIGINT,
  Date_of_Sales timestamp,
  Total_Revenue int,
  "Type of Sale" VARCHAR(50)
);

CodePudding user response:

There are a few issues with analysis like this, even if we constrain the total results to only 1 single year, we will still need to execute the query in 2 stages. There are all sorts of techniques to do this including the use of nested or inline queries, CTEs (Common Table Expressions), Lateral Joins (AKA CROSS APPLY) and we can even use Window Function. But we must first understand the distinct resultsets that are required to produce your Monthly Revenue and Yearly Revenue

You learn to skip the first principals and just jump into the solution later, I find the using CTEs to declare the individual components makes this example easier to maintain and understand, especially given that we need to re-use the aggregate expression in the % Revenue column.

WITH MonthlyRevenue AS
(
  SELECT Person_id
       , DATE_TRUNC('Month', Date_of_Sales) AS MONTH
       , SUM (Total_Revenue) AS "Monthly Revenue"
  FROM Sales
  GROUP BY Person_Id, DATE_TRUNC('Month', Date_of_Sales) 
),
YearlyRevenue AS
(
  SELECT Person_id
       , DATE_TRUNC('Year', Date_of_Sales) AS Year
       , SUM (Total_Revenue) AS "Yearly Revenue"
  FROM Sales
  GROUP BY Person_Id, DATE_TRUNC('Year', Date_of_Sales) 
)
SELECT m.Person_id
     , to_char(m.MONTH,'YYYY-MM') as MONTH
     , ROUND(CAST(m."Monthly Revenue" as numeric) / CAST(y."Yearly Revenue" as numeric),4) AS "% Revenue"
     , m."Monthly Revenue"
     , y."Yearly Revenue"
FROM MonthlyRevenue m
INNER JOIN YearlyRevenue y ON  m.Person_id = y.Person_id 
                           AND DATE_TRUNC('Year', m.MONTH) = y.Year
ORDER BY m.Person_id, m.MONTH

as the YearlyRevenue Result Set can be expressed as an aggregate of the MonthlyRevenue query, we can use a Window Function instead of joining the two recordsets, this would have a different performance profile to the previous query.

WITH MonthlyRevenue AS
(
  SELECT Person_id
       , DATE_TRUNC('Month', Date_of_Sales) AS MONTH
       , SUM (Total_Revenue) AS "Monthly Revenue"
  FROM Sales
  GROUP BY Person_Id, DATE_TRUNC('Month', Date_of_Sales) 
)
SELECT m.Person_id
     , to_char(m.MONTH,'YYYY-MM') as MONTH
     , ROUND(CAST(m."Monthly Revenue" as numeric) / CAST(SUM(m."Monthly Revenue")
       OVER (PARTITION BY m.Person_id, DATE_TRUNC('Year', m.MONTH)) as numeric),4) AS "% Revenue"
     , m."Monthly Revenue"
     , SUM(m."Monthly Revenue")
       OVER (PARTITION BY m.Person_id, DATE_TRUNC('Year', m.MONTH)) as "Yearly Revenue"
FROM MonthlyRevenue m
WHERE m.MONTH BETWEEN '2021-01-01' AND '2021-12-31'
ORDER BY m.Person_id, m.MONTH;

See this fiddle prepared with a fictitious dataset.


There are many other ways to re-write the same query, usually we want a query that provides the optimal performance. But performance isn't a function of your query syntax alone, it will depend on the indexing, fragmentation and the size and distribution of the data. What works well in one schema may easily be out-performed by a different query in a similar but different schema. There is considerable cost in maintenance of code and algorithms, so readability and how easy it is for you to understand and maintain are other factors to include, which is why I showcase for you today multiple options.

  •  Tags:  
  • Related