SQL Self Join to create Header and Line level columns-CodePudding

I've a table with the following data.

Item number	Date	Reference	DocNumber	Qty	Site
PLU-1000	4-JAN	Header	JRN - 523	2	SFT
RIN-000138	4-JAN	Line	JRN - 523	2	SFT
RIN-000096	4-JAN	Line	JRN - 523	6	SFT
RIN-000247	4-JAN	Line	JRN - 523	3	SFT
PLU-1001	4-JAN	Header	JRN - 523	5	SFT
RIN-000789	4-JAN	Line	JRN - 523	15	SFT
RIN-000001	4-JAN	Line	JRN - 523	10	SFT
RIN-000247	4-JAN	Line	JRN - 523	5	SFT
RIN-000031	4-JAN	Line	JRN - 523	2.5	SFT

I'm trying to create a table in the following format assuming items are ordered and each 'RIN' item is related to former 'PLU' item:

HeaderItem	Line Item	Date	Reference	DocNumber	Qty	Site
PLU-1000	RIN-000138	4-JAN	Line	JRN - 523	2	SFT
PLU-1000	RIN-000096	4-JAN	Line	JRN - 523	6	SFT
PLU-1000	RIN-000247	4-JAN	Line	JRN - 523	3	SFT
PLU-1001	RIN-000789	4-JAN	Line	JRN - 523	15	SFT
PLU-1001	RIN-000001	4-JAN	Line	JRN - 523	10	SFT
PLU-1001	RIN-000247	4-JAN	Line	JRN - 523	5	SFT
PLU-1001	RIN-000031	4-JAN	Line	JRN - 523	2.5	SFT

I have to remove the original header row and add the item number in front of line items repeatedly until the next 'PLU' item comes.

CodePudding user response：

Use this code :

WITH PartitionedDataPoints2 AS
    (
    select *
           ,c=COUNT(ItemNumber) OVER (ORDER BY rn)
    from    
         (SELECT t2.ItemNumber
            ,t1.Date
            ,t1.ItemNumber as SItemNumber
            ,t1.Reference
            ,t1.DocNumber
            ,t1.Qty
            ,t1.Site
            ,ROW_NUMBER() over (order by (select 1)) as rn
       FROM [Test].[dbo].[Sheet1]  t1
       left join (select ItemNumber from [Test].[dbo].[Sheet1] where Reference='Header') t2 
       on t1.ItemNumber= t2.ItemNumber)t1
    ),
        GroupedDataPoints AS
    (
        SELECT c, s=MAX(ItemNumber)
        FROM PartitionedDataPoints2
        GROUP BY c
    )

    SELECT  
         a.s  as HeaderItem
        ,b.Date  
        ,SItemNumber as LineItem    
        ,Reference
        ,DocNumber
        ,Qty
        ,Site
    FROM GroupedDataPoints a
    JOIN PartitionedDataPoints2 b ON a.c = b.c and Reference='Line'

Tip : [Test].[dbo].[Sheet1] is my source table

Output:

HeaderItem  LineItem    Date    Reference   DocNumber   Qty Site
PLU-1000    RIN-000138  2022-01-04  Line    JRN - 523   2   SFT
PLU-1000    RIN-000096  2022-01-04  Line    JRN - 523   6   SFT
PLU-1000    RIN-000247  2022-01-04  Line    JRN - 523   3   SFT
PLU-1001    RIN-000789  2022-01-04  Line    JRN - 523   15  SFT
PLU-1001    RIN-000001  2022-01-04  Line    JRN - 523   10  SFT
PLU-1001    RIN-000247  2022-01-04  Line    JRN - 523   5   SFT
PLU-1001    RIN-000031  2022-01-04  Line    JRN - 523   2.5 SFT

Description : first created a table that returns the following values:

ItemNumber  Date    SItemNumber Reference   DocNumber   Qty Site    rn
PLU-1000    2022-01-04  PLU-1000    Header  JRN - 523   2   SFT 1
NULL    2022-01-04  RIN-000138  Line    JRN - 523   2   SFT 2
NULL    2022-01-04  RIN-000096  Line    JRN - 523   6   SFT 3
NULL    2022-01-04  RIN-000247  Line    JRN - 523   3   SFT 4
PLU-1001    2022-01-04  PLU-1001    Header  JRN - 523   5   SFT 5
NULL    2022-01-04  RIN-000789  Line    JRN - 523   15  SFT 6
NULL    2022-01-04  RIN-000001  Line    JRN - 523   10  SFT 7
NULL    2022-01-04  RIN-000247  Line    JRN - 523   5   SFT 8
NULL    2022-01-04  RIN-000031  Line    JRN - 523   2.5 SFT 9

Then I filled in the null values using the corresponding codes and displayed.

CodePudding user response：

This type of analysis is often referred to as Gap and Island analysis. This solution is similar to that of @Saeed EmamYari, but takes advantage of the natural key Lot ID you have identified in the comments about the real data set. We need a real value in the dataset that can be used to maintain the sort order, we can't use a virtual ROW_NUMBER based off a faked column value as this can lead to unpredictable results.

The only other major difference in this solution is that a CASE statement is used to create the HeaderItemNumber column, instead of a self-join, this should result in a cleaner execution plan.

WITH BoundaryData as (
    SELECT CASE Reference WHEN 'Header' THEN [Item number] END as HeaderItemNumber, *
    FROM Items 
)
, GroupedData as (
    SELECT COUNT(HeaderItemNumber) OVER (ORDER BY [Lot ID]) as HeaderGroup, *
    FROM BoundaryData
)
, HeaderedData as (
    SELECT MAX(HeaderItemNumber) OVER (PARTITION BY HeaderGroup) as HeaderItem, *
    FROM GroupedData
)
SELECT HeaderItem, [Item number], Date, Reference, DocNumber, Qty, Site
FROM HeaderedData
WHERE Reference <> 'Header'
ORDER BY [Lot ID]

I've captured this in a fiddle: http://sqlfiddle.com/#!18/d792dd/3
NOTE: In the fiddle I have generated Lot ID as an identity column, the value is not relevant, only the sequence.