My
DELETE FROM FOO
WHERE [FOO_KEY] NOT IN
(
SELECT [FOO_KEY] FROM BAR
)
query is running shockingly slow. I know that BAR is a very big table, so I'm tempted to write
DELETE FROM FOO
WHERE [FOO_KEY] NOT IN
(
SELECT DISTINCT [FOO_KEY] FROM BAR
)
but I remember being told that:
- When
NULLs aren't a problem (and they're not here) there's hardly any difference betweenINandEXISTS. - When using
EXISTS, you don't need to useSELECT DISTINCTand there is no performance reason to do so.
This leaves me with good reason to believe that it is absolutely guaranteed that adding DISTINCT here will not make a difference. Is that correct?
CodePudding user response:
From a functional point of view, the queries with or without DISTINCT are identical (they would delete the same set of rows).
From a performance point of view, I am certain that SQL Server will always produce the same execution plan for both queries (but I cannot prove this).
For other database engines, this may be different. See:
