In Postgres, I am trying to find a way to enforce a special kind of constraint across two columns.
I have a table with this schema (the relevant bits):
CREATE TABLE resources (
id SERIAL PRIMARY KEY,
org_id INT NOT NULL,
source_id INT NOT NULL,
data JSON NOT NULL -- to keep it simple, it could be anything
);
Some example data:
id | org_id | source_id | data
---------------------------------------
1 | 11 | 1001 | {...}
2 | 22 | 2002 | {...}
3 | 33 | 3003 | {...}
4 | 11 | 1001 | {...}
5 | 11 | 2002 | {...} -- This should be invalid.
Assuming the data value of all rows is different, both rows 1 and 4 are valid. They're just different resources linked to the same org_id and source_id.
However, I never want a source_id to be used with different org_ids. In the example above, row 5 should be invalid because source_id = 2002 was already used once with org_id = 22, and it should not be possible to use it again with org_id = 11.
I know how to achieve this in a query, but it either requires locking, or it requires a multi-step process in application code (which wouldn't be the end of the world). I thought it could be done with an EXCLUDE constraint, but I'm not sure.
CodePudding user response:
This might do the trick:
CREATE TABLE resources (
id SERIAL PRIMARY KEY,
org_id INT NOT NULL,
source_id INT NOT NULL,
data JSON NOT NULL,
EXCLUDE USING gist
(org_id WITH =,
source_id WITH <>)
);
It works correctly with your example: dbfiddle
Explanation of <> operator is here
Example usage of = and <> operators is here
CodePudding user response:
You current database model is in 2NF. To promote it 3NF you need to break the table in two tables as explained in Third normal form -- Wikipedia.
To promote it to 3NF you can do:
CREATE TABLE organization (
org_id INT PRIMARY KEY NOT NULL,
source_id INT NOT NULL
);
CREATE TABLE resources (
id SERIAL PRIMARY KEY,
org_id INT NOT NULL REFERENCES organization (org_id)
data JSON NOT NULL
);
This way the table resources won't include values for the column source_id anymore, and there won't be any possibility of producing inconsistent data whatsoever. Now, source_id will safely live in the table organization instead, where only a single value per organization is possible.
As you see, this is a structural solution that is also quite clean.
