I have 2 tables which I would like to query and display data differences:
CREATE TABLE order_splits_config (
id INT,
pair_id INT
);
CREATE TABLE active_pairs (
id INT,
pair VARCHAR(30),
exchange_active boolean,
exchange_id INT
);
INSERT INTO order_splits_config(id, pair_id)
VALUES (1, 83);
INSERT INTO order_splits_config(id, pair_id)
VALUES (2, 58);
INSERT INTO order_splits_config(id, pair_id)
VALUES (34, 34);
INSERT INTO active_pairs(id, pair, exchange_active, exchange_id)
VALUES (1, 'US/EN', true, 2);
INSERT INTO active_pairs(id, pair, exchange_active, exchange_id)
VALUES (2, 'GB/UK', true, 3);
INSERT INTO active_pairs(id, pair, exchange_active, exchange_id)
VALUES (2, 'FR/EU', true, 4);
I use this query to query the differences:
SELECT b.id, b.pair, b.exchange_id
FROM order_splits_config a
FULL OUTER JOIN active_pairs b
ON a.pair_id = b.id
WHERE a.pair_id IS NULL
OR b.id IS NULL
AND b.exchange_active = 'true';
This prints a lot of lines like this for example (data is just for example):
#,pair,id,exchange_id
1, US/EN,332,1
2, GB/UK,112,1
3, GB/UK,113,1
4, FR/EU,221,5
5, FR/EU,183,2
...
How can I use DISTINCT in order to get the result from the query unique by pair?
FIDDLE: https://www.db-fiddle.com/f/4D6VfqysPCWhQnh8zaFBps/2
CodePudding user response:
You only select columns from table active_pairs to begin with.
After joining on a.pair_id = b.id, the condition a.pair_id IS NULL is only true if there is no matching row in table order_splits_config AS a, which is more efficiently formulated as NOT EXISTS instead of the FULL JOIN.
As for b.id IS NULL AND b.exchange_active = 'true': again, b.id can only be NULL for rows of a where no matching b is found - in which case b.exchange_active can never be true. So the clause effectively eliminates all rows of a that were just added by the FULL JOIN and which would show up in the result as all NULL values since there are only columns from b. In short: don't FULL JOIN to begin with.
If active_pairs.pair is defined UNIQUE, it boils down to:
SELECT id, pair, exchange_id
FROM active_pairs b
WHERE NOT EXISTS (SELECT FROM order_splits_config a WHERE a.pair_id = b.id);
If active_pairs.pair is not defined UNIQUE, the simple solution is with DISTINCT ON:
SELECT DISTINCT ON (pair)
id, pair, exchange_id
FROM active_pairs b
WHERE NOT EXISTS (SELECT FROM order_splits_config a WHERE a.pair_id = b.id);
db<>fiddle here
From each set of duplicates on pair, you get an arbitrary pick.
For a deterministic pick, define what to pick and add an ORDER BY clause accordingly. Example: to get the one with the smallest id, add:
...
ORDER BY pair, id;
Also adds a sort order to the so far unsorted result.
If order_splits_config isn't trivially small, have an index on (pair_id) to make this fast.
There may be much faster solutions for a big active_pairs table, depending on undisclosed information, mostly the cardinality of column pair (how many duplicate values).
See:
