I've a table with the following data.
| Item number | Date | Reference | DocNumber | Qty | Site |
|---|---|---|---|---|---|
| PLU-1000 | 4-JAN | Header | JRN - 523 | 2 | SFT |
| RIN-000138 | 4-JAN | Line | JRN - 523 | 2 | SFT |
| RIN-000096 | 4-JAN | Line | JRN - 523 | 6 | SFT |
| RIN-000247 | 4-JAN | Line | JRN - 523 | 3 | SFT |
| PLU-1001 | 4-JAN | Header | JRN - 523 | 5 | SFT |
| RIN-000789 | 4-JAN | Line | JRN - 523 | 15 | SFT |
| RIN-000001 | 4-JAN | Line | JRN - 523 | 10 | SFT |
| RIN-000247 | 4-JAN | Line | JRN - 523 | 5 | SFT |
| RIN-000031 | 4-JAN | Line | JRN - 523 | 2.5 | SFT |
I'm trying to create a table in the following format assuming items are ordered and each 'RIN' item is related to former 'PLU' item:
| HeaderItem | Line Item | Date | Reference | DocNumber | Qty | Site |
|---|---|---|---|---|---|---|
| PLU-1000 | RIN-000138 | 4-JAN | Line | JRN - 523 | 2 | SFT |
| PLU-1000 | RIN-000096 | 4-JAN | Line | JRN - 523 | 6 | SFT |
| PLU-1000 | RIN-000247 | 4-JAN | Line | JRN - 523 | 3 | SFT |
| PLU-1001 | RIN-000789 | 4-JAN | Line | JRN - 523 | 15 | SFT |
| PLU-1001 | RIN-000001 | 4-JAN | Line | JRN - 523 | 10 | SFT |
| PLU-1001 | RIN-000247 | 4-JAN | Line | JRN - 523 | 5 | SFT |
| PLU-1001 | RIN-000031 | 4-JAN | Line | JRN - 523 | 2.5 | SFT |
I have to remove the original header row and add the item number in front of line items repeatedly until the next 'PLU' item comes.
CodePudding user response:
Use this code :
WITH PartitionedDataPoints2 AS
(
select *
,c=COUNT(ItemNumber) OVER (ORDER BY rn)
from
(SELECT t2.ItemNumber
,t1.Date
,t1.ItemNumber as SItemNumber
,t1.Reference
,t1.DocNumber
,t1.Qty
,t1.Site
,ROW_NUMBER() over (order by (select 1)) as rn
FROM [Test].[dbo].[Sheet1] t1
left join (select ItemNumber from [Test].[dbo].[Sheet1] where Reference='Header') t2
on t1.ItemNumber= t2.ItemNumber)t1
),
GroupedDataPoints AS
(
SELECT c, s=MAX(ItemNumber)
FROM PartitionedDataPoints2
GROUP BY c
)
SELECT
a.s as HeaderItem
,b.Date
,SItemNumber as LineItem
,Reference
,DocNumber
,Qty
,Site
FROM GroupedDataPoints a
JOIN PartitionedDataPoints2 b ON a.c = b.c and Reference='Line'
Tip : [Test].[dbo].[Sheet1] is my source table
Output:
HeaderItem LineItem Date Reference DocNumber Qty Site
PLU-1000 RIN-000138 2022-01-04 Line JRN - 523 2 SFT
PLU-1000 RIN-000096 2022-01-04 Line JRN - 523 6 SFT
PLU-1000 RIN-000247 2022-01-04 Line JRN - 523 3 SFT
PLU-1001 RIN-000789 2022-01-04 Line JRN - 523 15 SFT
PLU-1001 RIN-000001 2022-01-04 Line JRN - 523 10 SFT
PLU-1001 RIN-000247 2022-01-04 Line JRN - 523 5 SFT
PLU-1001 RIN-000031 2022-01-04 Line JRN - 523 2.5 SFT
Description : first created a table that returns the following values:
ItemNumber Date SItemNumber Reference DocNumber Qty Site rn
PLU-1000 2022-01-04 PLU-1000 Header JRN - 523 2 SFT 1
NULL 2022-01-04 RIN-000138 Line JRN - 523 2 SFT 2
NULL 2022-01-04 RIN-000096 Line JRN - 523 6 SFT 3
NULL 2022-01-04 RIN-000247 Line JRN - 523 3 SFT 4
PLU-1001 2022-01-04 PLU-1001 Header JRN - 523 5 SFT 5
NULL 2022-01-04 RIN-000789 Line JRN - 523 15 SFT 6
NULL 2022-01-04 RIN-000001 Line JRN - 523 10 SFT 7
NULL 2022-01-04 RIN-000247 Line JRN - 523 5 SFT 8
NULL 2022-01-04 RIN-000031 Line JRN - 523 2.5 SFT 9
Then I filled in the null values using the corresponding codes and displayed.
CodePudding user response:
This type of analysis is often referred to as Gap and Island analysis. This solution is similar to that of @Saeed EmamYari, but takes advantage of the natural key Lot ID you have identified in the comments about the real data set. We need a real value in the dataset that can be used to maintain the sort order, we can't use a virtual ROW_NUMBER based off a faked column value as this can lead to unpredictable results.
The only other major difference in this solution is that a CASE statement is used to create the HeaderItemNumber column, instead of a self-join, this should result in a cleaner execution plan.
WITH BoundaryData as (
SELECT CASE Reference WHEN 'Header' THEN [Item number] END as HeaderItemNumber, *
FROM Items
)
, GroupedData as (
SELECT COUNT(HeaderItemNumber) OVER (ORDER BY [Lot ID]) as HeaderGroup, *
FROM BoundaryData
)
, HeaderedData as (
SELECT MAX(HeaderItemNumber) OVER (PARTITION BY HeaderGroup) as HeaderItem, *
FROM GroupedData
)
SELECT HeaderItem, [Item number], Date, Reference, DocNumber, Qty, Site
FROM HeaderedData
WHERE Reference <> 'Header'
ORDER BY [Lot ID]
I've captured this in a fiddle: http://sqlfiddle.com/#!18/d792dd/3
NOTE: In the fiddle I have generated Lot ID as an identity column, the value is not relevant, only the sequence.
