SELECT DISTINCT to return at most one row-CodePudding

Given the following db structure:

Regions

id	name
1	EU
2	US
3	SEA

Customers:

id	name	region
1	peter	1
2	henry	1
3	john	2

There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.

There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.

So maybe something like this: SELECT DISTINCT region FROM customers WHERE id IN (?, ?)

The problem with this is that the result will be either an array (if the customers are not within the same region) or a single value.

Is there are more elegant way of solving this constraint? I was thinking of SELECT INTO and use a temporary table, or I could SELECT COUNT(DISTINCT region) and then do another SELECT for the actual value if the count is less than 2, but I'd like to avoid the performance hit if possible.

CodePudding user response：

There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.

There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.

This query should work:

WITH q AS (
    SELECT
        COUNT( * ) AS CountCustomers,
        COUNT( DISTINCT c.Region ) AS CountDistinctRegions,
--      MIN( c.Region ) AS MinRegion
        FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion
    FROM
        Customers AS c
    WHERE
        c.CustomerId = $senderCustomerId
        OR
        c.CustomerId = $receiverCustomerId
)
SELECT
    CASE WHEN q.CountCustomers = 2 AND q.CountDistinctRegions = 2 THEN 'OK' ELSE 'BAD' END AS "Status",
    CASE WHEN q.CountDistinctRegions = 2 THEN q.MinRegion END AS SingleRegion
FROM
    q

The above query will always return a single row with 2 columns: Status and SingleRegion.
SQL doesn't have a "SINGLE( col )" aggregate function (i.e. a function that is NULL unless the aggregation group has a single row), but we can abuse MIN (or MAX) with a CASE WHEN COUNT() in a CTE or derived-table as an equivalent operation.
- ~~Alternatively, windowing-functions could be used, but annoyingly they don't work in GROUP BY queries despite being so similar, argh~~.
  - Once again, this is the ISO SQL committee's fault, not Postgre's.
As your Region column is UUID you cannot use it with MIN, but I understand it should work with FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion.
As for the columns:
- The Status column is either 'OK' or 'BAD' based on those business-constraints you mentioned. You might want to change it to a bit column instead of a textual one, though.
- The SingleRegion column will be NOT NULL (with a valid region) if CountDistinctRegions = 2 regardless of CountCustomers, but feel free to change that, just-in-case you still want that info.