Home > Mobile >  SQL Self Join to create Header and Line level columns
SQL Self Join to create Header and Line level columns

Time:01-12

I've a table with the following data.

Item number Date Reference DocNumber Qty Site
PLU-1000 4-JAN Header JRN - 523 2 SFT
RIN-000138 4-JAN Line JRN - 523 2 SFT
RIN-000096 4-JAN Line JRN - 523 6 SFT
RIN-000247 4-JAN Line JRN - 523 3 SFT
PLU-1001 4-JAN Header JRN - 523 5 SFT
RIN-000789 4-JAN Line JRN - 523 15 SFT
RIN-000001 4-JAN Line JRN - 523 10 SFT
RIN-000247 4-JAN Line JRN - 523 5 SFT
RIN-000031 4-JAN Line JRN - 523 2.5 SFT

I'm trying to create a table in the following format assuming items are ordered and each 'RIN' item is related to former 'PLU' item:

HeaderItem Line Item Date Reference DocNumber Qty Site
PLU-1000 RIN-000138 4-JAN Line JRN - 523 2 SFT
PLU-1000 RIN-000096 4-JAN Line JRN - 523 6 SFT
PLU-1000 RIN-000247 4-JAN Line JRN - 523 3 SFT
PLU-1001 RIN-000789 4-JAN Line JRN - 523 15 SFT
PLU-1001 RIN-000001 4-JAN Line JRN - 523 10 SFT
PLU-1001 RIN-000247 4-JAN Line JRN - 523 5 SFT
PLU-1001 RIN-000031 4-JAN Line JRN - 523 2.5 SFT

I have to remove the original header row and add the item number in front of line items repeatedly until the next 'PLU' item comes.

CodePudding user response:

Use this code :

WITH PartitionedDataPoints2 AS
    (
    select *
           ,c=COUNT(ItemNumber) OVER (ORDER BY rn)
    from    
         (SELECT t2.ItemNumber
            ,t1.Date
            ,t1.ItemNumber as SItemNumber
            ,t1.Reference
            ,t1.DocNumber
            ,t1.Qty
            ,t1.Site
            ,ROW_NUMBER() over (order by (select 1)) as rn
       FROM [Test].[dbo].[Sheet1]  t1
       left join (select ItemNumber from [Test].[dbo].[Sheet1] where Reference='Header') t2 
       on t1.ItemNumber= t2.ItemNumber)t1
    ),
        GroupedDataPoints AS
    (
        SELECT c, s=MAX(ItemNumber)
        FROM PartitionedDataPoints2
        GROUP BY c
    )

    SELECT  
         a.s  as HeaderItem
        ,b.Date  
        ,SItemNumber as LineItem    
        ,Reference
        ,DocNumber
        ,Qty
        ,Site
    FROM GroupedDataPoints a
    JOIN PartitionedDataPoints2 b ON a.c = b.c and Reference='Line'

Tip : [Test].[dbo].[Sheet1] is my source table

Output:

HeaderItem  LineItem    Date    Reference   DocNumber   Qty Site
PLU-1000    RIN-000138  2022-01-04  Line    JRN - 523   2   SFT
PLU-1000    RIN-000096  2022-01-04  Line    JRN - 523   6   SFT
PLU-1000    RIN-000247  2022-01-04  Line    JRN - 523   3   SFT
PLU-1001    RIN-000789  2022-01-04  Line    JRN - 523   15  SFT
PLU-1001    RIN-000001  2022-01-04  Line    JRN - 523   10  SFT
PLU-1001    RIN-000247  2022-01-04  Line    JRN - 523   5   SFT
PLU-1001    RIN-000031  2022-01-04  Line    JRN - 523   2.5 SFT

Description : first created a table that returns the following values:

ItemNumber  Date    SItemNumber Reference   DocNumber   Qty Site    rn
PLU-1000    2022-01-04  PLU-1000    Header  JRN - 523   2   SFT 1
NULL    2022-01-04  RIN-000138  Line    JRN - 523   2   SFT 2
NULL    2022-01-04  RIN-000096  Line    JRN - 523   6   SFT 3
NULL    2022-01-04  RIN-000247  Line    JRN - 523   3   SFT 4
PLU-1001    2022-01-04  PLU-1001    Header  JRN - 523   5   SFT 5
NULL    2022-01-04  RIN-000789  Line    JRN - 523   15  SFT 6
NULL    2022-01-04  RIN-000001  Line    JRN - 523   10  SFT 7
NULL    2022-01-04  RIN-000247  Line    JRN - 523   5   SFT 8
NULL    2022-01-04  RIN-000031  Line    JRN - 523   2.5 SFT 9

Then I filled in the null values using the corresponding codes and displayed.

CodePudding user response:

This type of analysis is often referred to as Gap and Island analysis. This solution is similar to that of @Saeed EmamYari, but takes advantage of the natural key Lot ID you have identified in the comments about the real data set. We need a real value in the dataset that can be used to maintain the sort order, we can't use a virtual ROW_NUMBER based off a faked column value as this can lead to unpredictable results.

The only other major difference in this solution is that a CASE statement is used to create the HeaderItemNumber column, instead of a self-join, this should result in a cleaner execution plan.

WITH BoundaryData as (
    SELECT CASE Reference WHEN 'Header' THEN [Item number] END as HeaderItemNumber, *
    FROM Items 
)
, GroupedData as (
    SELECT COUNT(HeaderItemNumber) OVER (ORDER BY [Lot ID]) as HeaderGroup, *
    FROM BoundaryData
)
, HeaderedData as (
    SELECT MAX(HeaderItemNumber) OVER (PARTITION BY HeaderGroup) as HeaderItem, *
    FROM GroupedData
)
SELECT HeaderItem, [Item number], Date, Reference, DocNumber, Qty, Site
FROM HeaderedData
WHERE Reference <> 'Header'
ORDER BY [Lot ID]

I've captured this in a fiddle: http://sqlfiddle.com/#!18/d792dd/3
NOTE: In the fiddle I have generated Lot ID as an identity column, the value is not relevant, only the sequence.

  •  Tags:  
  • Related