Joining two tables at random-CodePudding

I have two tables, on with main data, and another shorter table with additional data.

I would like to join the rows from the shorter table to some of the rows of the main table, at random. For example:

main table:

id	data
1	apple
2	banana
3	cherry
4	date
5	elderberry
6	fig

secondary table:

id	data
1	accordion
2	banjo

Desired Result:

main	secondary
… ?	accordion
… ?	banjo

I can think of one way to do it, using a lot of pre-processing with CTEs:

WITH
    cte1 AS (SELECT data FROM main ORDER BY random() LIMIT 2),
    cte2 AS (SELECT row_number() OVER() AS row, data FROM cte1),
    cte3 AS (SELECT row_number() OVER () AS row, data FROM secondary)
SELECT *
FROM cte2 JOIN cte3 ON cte2.row=cte3.row;

It works, but is there a more straightforward way of joining two tables at random?

I have attached a fiddle: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=21af08976112c7ac7c18329fa3699b8c&hide=2

CodePudding user response：

A CTE is basically just a re-usable template for a subquery.
So this can be golfcoded to using 2 subqueries.

SELECT m.rn, m.data main_data, s.data secondary_data
FROM (SELECT data, ROW_NUMBER() OVER (ORDER BY random()) rn FROM main) m
JOIN (SELECT data, ROW_NUMBER() OVER (ORDER BY random()) rn FROM secondary) s USING (rn)

CodePudding user response：

I could rewrite it to this:

SELECT *
FROM (SELECT row_number() OVER (ORDER BY random()) as id,
             data
      FROM main
      ORDER BY RANDOM()) m1
JOIN secondary s on s.id = m1.id

dbfiddle

Update: LIMIT is not needed after looking at @LukStorm's version

I assumed that you know which table is shorter so there is only one column with generated id's