I have two tables, on with main data, and another shorter table with additional data.
I would like to join the rows from the shorter table to some of the rows of the main table, at random. For example:
main table:
| id | data |
|---|---|
| 1 | apple |
| 2 | banana |
| 3 | cherry |
| 4 | date |
| 5 | elderberry |
| 6 | fig |
secondary table:
| id | data |
|---|---|
| 1 | accordion |
| 2 | banjo |
Desired Result:
| main | secondary |
|---|---|
| … ? | accordion |
| … ? | banjo |
I can think of one way to do it, using a lot of pre-processing with CTEs:
WITH
cte1 AS (SELECT data FROM main ORDER BY random() LIMIT 2),
cte2 AS (SELECT row_number() OVER() AS row, data FROM cte1),
cte3 AS (SELECT row_number() OVER () AS row, data FROM secondary)
SELECT *
FROM cte2 JOIN cte3 ON cte2.row=cte3.row;
It works, but is there a more straightforward way of joining two tables at random?
I have attached a fiddle: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=21af08976112c7ac7c18329fa3699b8c&hide=2
CodePudding user response:
A CTE is basically just a re-usable template for a subquery.
So this can be golfcoded to using 2 subqueries.
SELECT m.rn, m.data main_data, s.data secondary_data
FROM (SELECT data, ROW_NUMBER() OVER (ORDER BY random()) rn FROM main) m
JOIN (SELECT data, ROW_NUMBER() OVER (ORDER BY random()) rn FROM secondary) s USING (rn)
CodePudding user response:
I could rewrite it to this:
SELECT *
FROM (SELECT row_number() OVER (ORDER BY random()) as id,
data
FROM main
ORDER BY RANDOM()) m1
JOIN secondary s on s.id = m1.id
Update: LIMIT is not needed after looking at @LukStorm's version
I assumed that you know which table is shorter so there is only one column with generated id's
