Home > Enterprise >  How to Return Records Equal to a Specific Percentage of an Aggregate in Transact-SQL?
How to Return Records Equal to a Specific Percentage of an Aggregate in Transact-SQL?

Time:02-04

My requirement is to provide a random sample of claims that comprise 2.5% of the total amount paid and also comprise 2.5% of total claims for a given population. The goal is to deliver records in a report that meet both criteria. My staging table is defined as follows:

[RecordId] UniqueIdentifier NOT NULL PRIMARY KEY DEFAULT NEWID()
,ClaimNO varchar(50)
,Company_ID varchar(10)
,HPCode varchar(10)
,FinancialResponsibility varchar(30)
,ProviderType varchar(50)
,DateOfService date
,DatePaid date
,ClaimType varchar(50)
,TotalBilled numeric(11,2)
,TotalPaid numeric(11,2)
,ProcessorType varchar(100)

I've already built the logic to return 2.5% of the total number of claims but need guidance in how best to ensure both criterion are met.

Here's what I've tried thus far:

with cteTotals as (
Select Count(*) as TotalClaims, sum(TotalPaid) as TotalPaid, sum(TotalPaid) * .025 as PaidSampleAmount
from [Z_Monthly_Quality_Review]
),

ctePopulation as (
    Select *
    from [Z_Monthly_Quality_Review]
),

cteSampleRows as (  
    select TOP 2.5 PERCENT NEWID() RandomID, RecordID, ClaimNo, HPCode, FinancialResponsibility, ProviderType, ProcessorType, 
    Format(DateOfService, 'MM/dd/yyyy') as DateOfService, Format(DatePaid, 'MM/dd/yyyy') as DatePaid, ClaimType, TotalBilled, TotalPaid  
    from [Z_Monthly_Quality_Review]  
    order by NEWID()
    ),

cteSamplePaid as (
    Select Top 2.5 PERCENT NEWID() RandomID, RecordID, ClaimNo, HPCode, FinancialResponsibility, ProviderType, ProcessorType,
    Format(DateOfService, 'MM/dd/yyyy') as DateOfService, Format(DatePaid, 'MM/dd/yyyy') as DatePaid, ClaimType, TotalBilled, TotalPaid  
    from [Z_Monthly_Quality_Review] mqr
    inner join ctePopulation cte on mqr.ClaimNo = cte.ClaimNO
    order by NEWID()
)

Since both criterion must be satisfied, how should I structure both CTEs to ensure this? In my cteSamplePaid, how do I ensure that the sum of total paid equals 2.5% of the total population? Would this be accomplished with a Having clause? The end result will be displayed to my business users via SQL Server Reporting Services. Ideally, I would want to provide them with 1 sample that meets both criteria. If that's not possible, how do I randomly sample claims from both criterion?

CodePudding user response:

Don't think there is a guaranteed way it will add up to 2.5% of the total. There's no guarantee results and the performance would be very poor as it you would essentially have to brute force every possible combination of rows. A way to get very close to your goal would be to use return rows that add up to an acceptable margin of error.

Since no sample data was provided, I just used AdventureWorks2017 (downloaded from here)

USE AdventureWorks2017 
GO

DROP TABLE IF EXISTS #SalesData
SELECT SalesOrderID AS ID,TotalDue
INTO #SalesData
FROM Sales.SalesOrderHeader

Declare @DesiredPercentage Numeric(10,3) = .025 /*Desired sum percentage of total rows*/
        ,@AcceptableMargin Numeric(10,3) = .01 /*Random row total can be plus or minus this percentage of the desired sum*/
DECLARE @DesiredSum Numeric(16,2) =  @DesiredPercentage *(SELECT SUM(TotalDue) FROM #SalesData)

/*For loop*/
DECLARE @RowNum INT
    ,@LoopCounter INT = 1
    
WHILE (1=1)
BEGIN
    DROP TABLE IF EXISTS #RandomData
    SELECT RowNum = ROW_NUMBER() OVER (ORDER BY B.RandID),A.*,RunningTotal = SUM(TotalDue) OVER (ORDER BY B.RandID)
    INTO #RandomData
    FROM #SalesData AS A
    CROSS APPLY (SELECT RandID = NEWID()) AS B
    WHERE TotalDue < @DesiredSum /*If single row bigger than desired sum, then filter it out*/
    ORDER BY B.RandID

    SELECT Top(1) @RowNum = RowNum
    FROM #RandomData AS A
    CROSS APPLY (SELECT DeltaFromDesiredSum = ABS(RunningTotal-@DesiredSum)) AS B
    WHERE RunningTotal BETWEEN @DesiredSum *(1-@AcceptableMargin) AND @DesiredSum *(1 @AcceptableMargin)
    ORDER BY DeltaFromDesiredSum

    IF (@RowNum IS NOT NULL)
        BREAK;

    IF (@LoopCounter >=100) /*Prevents infinite loops*/
        THROW 59194,'Result unable to be generated in 100 tries. Recommend expanding acceptable margin',1;

    SET @LoopCounter  =1;
END

SELECT *
FROM #RandomData
WHERE RowNum <= @RowNum

SELECT RandomRowTotal = SUM(TotalDue)
    ,DesiredSum = @DesiredSum
    ,PercentageFromDesiredSum = Concat(Cast(Round(100*(1-SUM(TotalDue)/@DesiredSum),2) as Float),'%')
FROM #RandomData
WHERE RowNum <= @RowNum
  •  Tags:  
  • Related