So I have some data as follows:
ID status date
001 happy 01-01-2021
001 sad 01-02-2021
002 angry 01-03-2021
003 sad 01-04-2021
004 happy 01-05-2021
003 happy 01-05-2021
004 happy 01-06-2021
And all I want to do is have a table with unique ID and the status on their most recent date.
Final Output:
ID status date
001 sad 01-02-2021
002 angry 01-03-2021
003 happy 01-05-2021
004 happy 01-06-2021
I know how to do this with a row_number PARTITION but this is very computationally taxing. Any other method of which I can accomplish the above?
CodePudding user response:
The cheap way, is to tell the DB want you want and let it do the work. A partition and then sort is not the expensive.
SELECT a,b,c,e
FROM table
QUALIFY ROW_NUMBER() OVER(PARTITION BY a,b ORDER BY c) = 1;
so to compare "speed" of the current two solutions, let make a table with 10M row:
CREATE TABLE small_data AS
SELECT
SEQ8() as seq,
mod(seq, 20) as part,
random() as rnd
FROM TABLE(generator(ROWCOUNT => 10000000));
running my code
SELECT seq, part, rnd
FROM small_data
QUALIFY ROW_NUMBER() OVER(PARTITION BY part ORDER BY rnd) = 1;
605 ms to return.
running the self join of Tim's code
SELECT t1.*
FROM small_data t1
WHERE NOT EXISTS (SELECT 1 FROM small_data t2
WHERE t2.part = t1.part AND t2.rnd > t1.rnd);
I aborted after 3m 30s
CodePudding user response:
You could try using exists logic as an alternative to ROW_NUMBER:
SELECT t1.*
FROM yourTable t1
WHERE NOT EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.ID = t1.ID AND t2.date > t1.date);
