How do I write a proper SQL query in order to display the below output. The revenue as a percentage rounded up to the fourth decimal. Ex. If Sales person has generated $24K in total (2021)
I am using pgAdmin 4 (v4.12) and each time I try, I get either total of Monthly but not Yearly and vice versa.
This is my attempted query:
SELECT Person_id
, MONTH(Date_of_Sales) AS month
, YEAR(Date_of_Sales) AS year
, ROUND((Total_Revenue)/SUM(Total_Revenue),3) AS pct_rev
, SUM(Total_Revenue) AS monthly_revenue
, SUM(Total_Revenue) AS yearly_revenue
WHERE Date_of_Sales >= '2021-01-01' and Date_of_Sales <='2021-12-31'
GROUP BY Month(Date_of_Sales)
ORDER BY Month(Date_of_Sales);
Expected Output
| Person_id | MONTH | % Revenue | Monthly Revenue | Yearly Revenue |
|---|---|---|---|---|
| 65464654 | 2021-01 | 0.1667 | 2000 | 12000 |
| 65464654 | 2021-02 | 0.2500 | 3000 | 12000 |
| 65464654 | 2021-03 | 0.1667 | 2000 | 12000 |
| 65464655 | 2021-01 | 0.0208 | 500 | 24000 |
| 65464655 | 2021-03 | 0.1667 | 4000 | 24000 |
NOTE:
% Revenueis the percentage of theMonthly Revenuewith respect to theYearly Revenuefor that Sales Person
- should be rounded to 4 decimal places!
Schema
CREATE TABLE Sales (
Person_id BIGINT,
Date_of_Sales timestamp,
Total_Revenue int,
"Type of Sale" VARCHAR(50)
);
CodePudding user response:
There are a few issues with analysis like this, even if we constrain the total results to only 1 single year, we will still need to execute the query in 2 stages. There are all sorts of techniques to do this including the use of nested or inline queries, CTEs (Common Table Expressions), Lateral Joins (AKA CROSS APPLY) and we can even use Window Function. But we must first understand the distinct resultsets that are required to produce your Monthly Revenue and Yearly Revenue
You learn to skip the first principals and just jump into the solution later, I find the using CTEs to declare the individual components makes this example easier to maintain and understand, especially given that we need to re-use the aggregate expression in the
% Revenuecolumn.
WITH MonthlyRevenue AS
(
SELECT Person_id
, DATE_TRUNC('Month', Date_of_Sales) AS MONTH
, SUM (Total_Revenue) AS "Monthly Revenue"
FROM Sales
GROUP BY Person_Id, DATE_TRUNC('Month', Date_of_Sales)
),
YearlyRevenue AS
(
SELECT Person_id
, DATE_TRUNC('Year', Date_of_Sales) AS Year
, SUM (Total_Revenue) AS "Yearly Revenue"
FROM Sales
GROUP BY Person_Id, DATE_TRUNC('Year', Date_of_Sales)
)
SELECT m.Person_id
, to_char(m.MONTH,'YYYY-MM') as MONTH
, ROUND(CAST(m."Monthly Revenue" as numeric) / CAST(y."Yearly Revenue" as numeric),4) AS "% Revenue"
, m."Monthly Revenue"
, y."Yearly Revenue"
FROM MonthlyRevenue m
INNER JOIN YearlyRevenue y ON m.Person_id = y.Person_id
AND DATE_TRUNC('Year', m.MONTH) = y.Year
ORDER BY m.Person_id, m.MONTH
as the YearlyRevenue Result Set can be expressed as an aggregate of the MonthlyRevenue query, we can use a Window Function instead of joining the two recordsets, this would have a different performance profile to the previous query.
WITH MonthlyRevenue AS
(
SELECT Person_id
, DATE_TRUNC('Month', Date_of_Sales) AS MONTH
, SUM (Total_Revenue) AS "Monthly Revenue"
FROM Sales
GROUP BY Person_Id, DATE_TRUNC('Month', Date_of_Sales)
)
SELECT m.Person_id
, to_char(m.MONTH,'YYYY-MM') as MONTH
, ROUND(CAST(m."Monthly Revenue" as numeric) / CAST(SUM(m."Monthly Revenue")
OVER (PARTITION BY m.Person_id, DATE_TRUNC('Year', m.MONTH)) as numeric),4) AS "% Revenue"
, m."Monthly Revenue"
, SUM(m."Monthly Revenue")
OVER (PARTITION BY m.Person_id, DATE_TRUNC('Year', m.MONTH)) as "Yearly Revenue"
FROM MonthlyRevenue m
WHERE m.MONTH BETWEEN '2021-01-01' AND '2021-12-31'
ORDER BY m.Person_id, m.MONTH;
See this fiddle prepared with a fictitious dataset.
There are many other ways to re-write the same query, usually we want a query that provides the optimal performance. But performance isn't a function of your query syntax alone, it will depend on the indexing, fragmentation and the size and distribution of the data. What works well in one schema may easily be out-performed by a different query in a similar but different schema. There is considerable cost in maintenance of code and algorithms, so readability and how easy it is for you to understand and maintain are other factors to include, which is why I showcase for you today multiple options.
