I have two tables that are mapped together with a mapping table; the tables are as follows:
- data
- metadata
- matadata_map
metadata_map is mapping data together with metadata.
In the metadata_map table, I need to map some IDs together. I need to find out which data ids have X, Y .. N metadata ids.
The metadata_map table could look like the following:
--------------------
| data_id| meta_id |
--------------------
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 4 |
--------------------
In this example, I will try to find out which data_ids has both meta_ids 2 and 3.
When looking up in the metadata_map table, I should get the result: (1, and 3)
----------
| data |
----------
| 1 |
| 3 |
----------
since both data_id 1 and 3 has meta_id 2 and 3.
My question is, how can I best match them together in PostgreSQL to get the result? This also requires that it's highly performant because the table has, at the moment of typing, about ~700mil entries.
CodePudding user response:
Alternative - using an array:
select data_id
from metadata_map
group by data_id
having array_agg(meta_id) @> array[2,3];
This query retrieves data_ids that have both meta_id 2 and meta_id 3 but may have others too.
having array_agg(meta_id order by meta_id) = array[2,3];
would retrieve data_ids that have exactly meta_id 2 and meta_id 3 and no others.
The pattern can be reused with other array operators too.
CodePudding user response:
You can use grouping
select data_id
from metadata_map
where meta_id in (2,3)
group by data_id
having count(*) = 2
The query assumes metadata_map contains no dups. Alternatively use count(distinct meta_id).
