Home > Software design >  Joining two tables at random
Joining two tables at random

Time:02-05

I have two tables, on with main data, and another shorter table with additional data.

I would like to join the rows from the shorter table to some of the rows of the main table, at random. For example:

main table:

id data
1 apple
2 banana
3 cherry
4 date
5 elderberry
6 fig

secondary table:

id data
1 accordion
2 banjo

Desired Result:

main secondary
… ? accordion
… ? banjo

I can think of one way to do it, using a lot of pre-processing with CTEs:

WITH
    cte1 AS (SELECT data FROM main ORDER BY random() LIMIT 2),
    cte2 AS (SELECT row_number() OVER() AS row, data FROM cte1),
    cte3 AS (SELECT row_number() OVER () AS row, data FROM secondary)
SELECT *
FROM cte2 JOIN cte3 ON cte2.row=cte3.row;

It works, but is there a more straightforward way of joining two tables at random?

I have attached a fiddle: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=21af08976112c7ac7c18329fa3699b8c&hide=2

CodePudding user response:

A CTE is basically just a re-usable template for a subquery.
So this can be golfcoded to using 2 subqueries.

SELECT m.rn, m.data main_data, s.data secondary_data
FROM (SELECT data, ROW_NUMBER() OVER (ORDER BY random()) rn FROM main) m
JOIN (SELECT data, ROW_NUMBER() OVER (ORDER BY random()) rn FROM secondary) s USING (rn)

CodePudding user response:

I could rewrite it to this:

SELECT *
FROM (SELECT row_number() OVER (ORDER BY random()) as id,
             data
      FROM main
      ORDER BY RANDOM()) m1
JOIN secondary s on s.id = m1.id 

dbfiddle

Update: LIMIT is not needed after looking at @LukStorm's version

I assumed that you know which table is shorter so there is only one column with generated id's

  •  Tags:  
  • Related