Home > Software design >  SQL efficient way to match ANY in a large table
SQL efficient way to match ANY in a large table

Time:01-19

I am joining a small table and a very large table and want to return a distinct item if ANY items match. The table is so large that it takes hours for something that I think should take seconds.

The problem is that I am "iterating" over every single entry in the second table. I want to be able to "break" once a condition is met and return that value instead of continuing over every single account.

In the code below, I am finding every single row for each name that I am joining, even though I am only returning the DISTINCT example.name and don't care about every row. How can I return DISTINCT.name after finding the first instance of new_ex.data = ... after performing the INNER JOIN?

SELECT DISTINCT example.name
FROM (
    SELECT DISTINCT ex.user AS name
    FROM exampleTable ex
    WHERE ex.timestamp >= '2022-01-01'
    AND ex.group = 'test'
    AND new_ex.data = '123'
) AS example_users
INNER JOIN exampleTable new_ex on example_users.name = new_ex.user
AND new_ex.timestamp >= '2022-01-01'
AND (
    OR new_ex.data = 'abc'
    OR new_ex.data = 'def'
    OR new_ex.data = 'ghi'
    -- ~10 more of these OR statements
)

CodePudding user response:

Without seeing the data it's hard to be sure this can't be simplified further, but I think you can at least boil this down to

select distinct ex.user as name
from exampleTable ex
where ex.timestamp >= '2022-01-01'
and ex.group = 'test'
AND new_ex.data = '123'
and exists (
    select 1
    from exampleTable new_ex 
    where new_ex.user=ex.name
        and new_ex.data = '123'
        and new_ex.timestamp >= '2022-01-01'
        and new_ex.data in ('abc','def','ghi'...)
)

CodePudding user response:

Use below query, using multiple OR will cause performance issue. Instead use IN.

select DISTINCT ex.user from exampleTable ex
INNER JOIN exampleTable new_ex on example_users.user = new_ex.user
where ex.timestamp >= '2022-01-01'
AND ex.group = 'test'
AND new_ex.timestamp >= '2022-01-01'
AND new_ex.data in ('abc', 'def', 'ghi'); -- include all your values

You can also use below query,

select DISTINCT ex.user from exampleTable ex
INNER JOIN (select distinct user, timestamp, data from exampleTable) new_ex on example_users.user = new_ex.user
where ex.timestamp >= '2022-01-01'
AND ex.group = 'test'
AND new_ex.timestamp >= '2022-01-01'
AND new_ex.data in ('abc', 'def', 'ghi'); -- include all your values
  •  Tags:  
  • Related