I need to transform the following input table to the output table where output table will have ranges instead of per day data.
Input:
Asin day is_instock
--------------------
A1 1 0
A1 2 0
A1 3 1
A1 4 1
A1 5 0
A2 3 0
A2 4 0
Output:
asin start_day end_day is_instock
---------------------------------
A1 1 2 0
A1 3 4 1
A1 5 5 0
A2 3 4 0
CodePudding user response:
This is what is referred to as the "gaps and islands" problem. There's a fair amount of articles and references you can find if you use that search term.
Solution below:
/*Data setup*/
DROP TABLE IF EXISTS #Stock
CREATE TABLE #Stock ([Asin] Char(2),[day] int,is_instock bit)
INSERT INTO #Stock
VALUES
('A1',1,0)
,('A1',2,0)
,('A1',3,1)
,('A1',4,1)
,('A1',5,0)
,('A2',3,0)
,('A2',4,0);
/*Solution*/
WITH cte_Prev AS (
SELECT *
/*Compare previous day's stock status with current row's status. Every time it changes, return 1*/
,StockStatusChange = CASE WHEN is_instock = LAG(is_instock) OVER (PARTITION BY [Asin] ORDER BY [day]) THEN 0 ELSE 1 END
FROM #Stock
)
,cte_Groups AS (
/*Cumulative sum so everytime stock status changes, add 1 from StockStatusChange to begin the next group*/
SELECT GroupID = SUM(StockStatusChange) OVER (PARTITION BY [Asin] ORDER BY [day])
,*
FROM cte_Prev
)
SELECT [Asin]
,start_day = MIN([day])
,end_day = MAX([day])
,is_instock
FROM cte_Groups
GROUP BY [Asin],GroupID,is_instock
CodePudding user response:
You are looking for an operator described in the temporal data literature, and "best known" as PACK.
This operator was not made part of the SQL standard (SQL:2011) that introduced the temporal features of the literature into the language, so there's extremely little chance you're going to find anything to support you in any SQL product/dialect.
Boils down to : you'll have to write out the algorithm to do the PACKing yourself.
