Background (and the requirement): There is a table "T_CLASS" with multiple columns of which two are student names ex. Student1 and Student2. There's another table "T_STUDENT" where I want to check against T_CLASS, to get the distinct students from columns T_CLASS which doesn't exist in T_STUDENT table.
Also I would like to mention, the table contains circa 600M records each.
Sample query (and my attempt):
;with t_class(id, student1, student2) as (
select 1, 'Tom', 'Rahul' union all
select 2, 'Rahul', 'Nick' union all
select 3, 'David', 'Mark' union all
select 4, 'Rahul', 'Mark' union all
select 5, 'Rick', 'David'
)
, t_student (c_student) as (
select 'David' union all
select 'Nick' union all
select 'Mark' union all
select 'Rick'
)
-- Below is what I've tried --
select student1
from t_class crt
where not exists
(
select 1 from t_student djt
where lower(trim(crt.student1)) = lower(trim(djt.c_student))
)
union
select student2
from t_class crt
where not exists
(
select 1 from t_student djt
where lower(trim(crt.student2)) = lower(trim(djt.c_student))
)
Expected o/p:
Rahul
Tom
Note: I don't want any specific query as solution, but I want to understand it conceptually.
But is this a good technique or there can be any other optimal approach? I know I've to try out the ways and check execution plan, but can't think of any.
Please advise. Thanks in advance. :)
ps. Got this exec plan generated from SQL Server 2016 (however this query is actually on AWS Redshift) -
Edit 2 - Meanwhile I've tried another attempt ...
;with t_class(id, student1, student2) as (
select 1, 'Tom', 'Rahul' union all
select 2, 'Rahul', 'Nick' union all
select 3, 'David', 'Mark' union all
select 4, 'Rahul', 'Mark' union all
select 5, 'Rick', 'David'
)
, t_student (c_student) as (
select 'David' union all
select 'Nick' union all
select 'Mark' union all
select 'Rick'
)
select * from
(
select student1 studs from t_class
union
select student2 from t_class
) x
where not exists (select 1 from t_student ts where ts.c_student = x.studs)
Is this any better?
Execution plan:

CodePudding user response:
SELECT Z.STUDENT_NAME
(
SELECT C.STUDENT1 AS STUDENT_NAME
FROM T_CLASS AS C
UNION
SELECT X.STUDENT2
FROM T_CLASS AS X
)AS Z
EXCEPT
SELECT T.c_student
FROM T_STUDENT AS T
I hope, you can try this approach. It can be simplified, but I don't exactly remember the precedence of UNION/EXCEPT/INTERSECT-operators
CodePudding user response:
in postgresSQL you can do this to get students that are in Students and do not exist in Class.
SELECT name
FROM Students S
WHERE NOT EXISTS ( SELECT *
FROM Class C
WHERE S.sid = C.sid)

