I'm trying to understand, how window function works internally.
ID,Amt
A,1
B,2
C,3
D,4
E,5
If I run this, will give sum of all amount in total column against every record.
Select ID, SUM (AMT) OVER () total from table
but when I run this, it will give me cumulative sum
Select ID, SUM (AMT) OVER (order by ID) total from table
Trying to understand what is happening when its OVER() and OVER(order by ID)
What I've understood is when no partition is defined in OVER, it considers everything as single partition. But not able to understand when we add order by Id within over(), how come it starts doing cumulative sum ?
Can anyone share what's happening behind the scenes for this ?
CodePudding user response:
That is an interesting case, based on the documentation here is the explanation and example.
If PARTITION BY is not specified, the function treats all rows of the query result set as a single partition. Function will be applied on all rows in the partition if you don't specify ORDER BY clause.
So if you specifiey ORDER BY then
If it is specified, and a ROWS/RANGE is not specified, then default RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window frame by the functions that can accept optional ROWS/RANGE specification (for example min or max).
So technically these two commands are the same:
SELECT ID, SUM(AMT) OVER (ORDER BY ID) total FROM table
SELECT ID, SUM(AMT) OVER (ORDER BY ID RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) total FROM table
More about you can read in the documentation:https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-ver15
CodePudding user response:
This is not related to Oracle itself, but it's part of the SQL Standard and behaves the same way in many databases including Oracle, DB2, PostgreSQL, SQL Server, MySQL, MariaDB, H2, etc.
By definition, when you include the ORDER BY clause the engine will produce "running values" (cumulative aggregation) inside each partition; without the ORDER BY clause it produces the same, single value that aggregates the whole partition.
Now, the partition itself is mainly defined by the PARTITION BY clause. In its absence, the whole result set is considered as a single partition.
Finally, as a more advanced topic, the partition can be further tweaked using a "frame" clause (ROWS and RANGE) and by a "frame exclusion" clause (EXCLUDE).
