Home > Mobile >  Identify Distinct of two columns and not present in another column -
Identify Distinct of two columns and not present in another column -

Time:01-04

Background (and the requirement): There is a table "T_CLASS" with multiple columns of which two are student names ex. Student1 and Student2. There's another table "T_STUDENT" where I want to check against T_CLASS, to get the distinct students from columns T_CLASS which doesn't exist in T_STUDENT table.

Also I would like to mention, the table contains circa 600M records each.

Sample query (and my attempt):

;with t_class(id, student1, student2) as (
    select 1, 'Tom', 'Rahul' union all
    select 2, 'Rahul', 'Nick' union all
    select 3, 'David', 'Mark' union all
    select 4, 'Rahul', 'Mark' union all
    select 5, 'Rick', 'David'
)
, t_student (c_student) as (
        select 'David' union all
        select 'Nick' union all
        select 'Mark' union all
        select 'Rick' 
)
-- Below is what I've tried --
select student1
from t_class crt
where not exists
    (
        select 1 from t_student djt
        where lower(trim(crt.student1)) = lower(trim(djt.c_student))
    )
union
select student2
from t_class crt
where not exists
    (
        select 1 from t_student djt
        where lower(trim(crt.student2)) = lower(trim(djt.c_student))
    )

Expected o/p:

Rahul
Tom

Note: I don't want any specific query as solution, but I want to understand it conceptually.

But is this a good technique or there can be any other optimal approach? I know I've to try out the ways and check execution plan, but can't think of any.

Please advise. Thanks in advance. :)

ps. Got this exec plan generated from SQL Server 2016 (however this query is actually on AWS Redshift) -

enter image description here


Edit 2 - Meanwhile I've tried another attempt ...

;with t_class(id, student1, student2) as (
    select 1, 'Tom', 'Rahul' union all
    select 2, 'Rahul', 'Nick' union all
    select 3, 'David', 'Mark' union all
    select 4, 'Rahul', 'Mark' union all
    select 5, 'Rick', 'David'
)
, t_student (c_student) as (
        select 'David' union all
        select 'Nick' union all
        select 'Mark' union all
        select 'Rick' 
)
select * from
(
select student1 studs from t_class
union 
select student2 from t_class
) x
where not exists (select 1 from t_student ts where ts.c_student = x.studs)

Is this any better? Execution plan: enter image description here

CodePudding user response:

SELECT Z.STUDENT_NAME
(
 SELECT C.STUDENT1 AS STUDENT_NAME
    FROM T_CLASS AS C
  UNION 
 SELECT X.STUDENT2
   FROM T_CLASS AS X 
)AS Z
EXCEPT 
SELECT T.c_student
FROM T_STUDENT AS T

I hope, you can try this approach. It can be simplified, but I don't exactly remember the precedence of UNION/EXCEPT/INTERSECT-operators

CodePudding user response:

in postgresSQL you can do this to get students that are in Students and do not exist in Class.

SELECT name
        FROM Students S
        WHERE NOT EXISTS ( SELECT *
                           FROM Class C  
                           WHERE S.sid = C.sid)
  •  Tags:  
  • Related