I have three tables that contains the Employee details, their Leave details and their Activity.
Table A
| EmployeeID | EmployeeName | Age | Title |
|---|---|---|---|
| C001 | Samantha Yu | 32 | Auditor |
| C002 | Gary Lin | 34 | Manager |
| C003 | Tom Kenendy | 59 | HR Admin |
Table B
| EmployeeID | LeaveStart | LeaveEnd | LeaveType |
|---|---|---|---|
| C001 | 12-07-2021 | 21-07-2021 | Medical |
| C002 | 01-07-2021 | 03-07-2021 | Marriage |
| C002 | 21-08-2021 | 22-08-2021 | Annual |
| C002 | 15-10-2021 | 15-10-2021 | Annual |
| C003 | 09-12-2021 | 11-12-2021 | Marriage |
| C003 | 11-12-2021 | 13-12-2021 | Annual |
Table C
| EmployeeID | LatestActivity | ActivityDate |
|---|---|---|
| C001 | Promoted | 21-07-2021 |
| C002 | Promoted | 07-03-2021 |
| C002 | Restructure | 15-09-2021 |
| C002 | Change Department | 24-11-2021 |
| C003 | Demoted | 11-11-2021 |
| C003 | Change Department | 15-11-2021 |
What would be the SQL Query that meets the following requirements for the output:
- EACH employee only with the LATEST LeaveDateStart,LeaveDateEnd and Date.
- Exclude "Demoted" from the output column.
For example, the output for above would be:
| EmployeeID | EmployeeName | Title | LeaveStart | LeaveEnd | LeaveType | Date | LatestActivity |
|---|---|---|---|---|---|---|---|
| C001 | Samantha Yu | Auditor | 12-07-2021 | 15-10-2021 | Annual | 21-07-2021 | Promoted |
| C002 | Gary Lin | Manager | 15-10-2021 | 15-10-2021 | Annual | 24-11-2021 | Change Department |
| C003 | Tom Kenendy | HR Admin | 11-12-2021 | 13-12-2021 | Annual | 15-11-2021 | Change Department |
As you can see above, Gary has multiple records for his leaves and activities, but in the final output we only need the LATEST LeaveStart, LATEST LeaveEnd, LATEST Activity along with the ActivityDATE.
This is the query that I tried but it still returns duplicate record for each user for distinct LeaveType and LatestActivity.
SELECT DISTINCT
a.EMPLOYEEID,
a.EMPLOYEENAME,
a.TITLE,
MAX(b.LeaveStart) OVER(PARTITION BY a.EMPLOYEEID) as LeaveStart,
MAX(b.LeaveEnd) OVER(PARTITION BY a.EMPLOYEEID) as LeaveEnd,
MAX(c.Date) OVER(PARTITION BY a.EMPLOYEEID) as ActivtyDate,
b.LeaveType,
c.LatestActivity
FROM a
JOIN b
ON a.EMPLOYEEID = b.EMPLOYEEID
JOIN c
ON a.EMPLOYEEID = c.EMPLOYEEID
| EmployeeID | EmployeeName | Title | LeaveStart | LeaveEnd | LeaveType | Date | LatestActivity |
|---|---|---|---|---|---|---|---|
| C001 | Samantha Yu | Auditor | 12-07-2021 | 15-10-2021 | Annual | 21-07-2021 | Promoted |
| C002 | Gary Lin | Manager | 15-10-2021 | 15-10-2021 | Annual | 24-11-2021 | Change Department |
| C002 | Gary Lin | Manager | 21-08-2021 | 22-08-2021 | Annual | 15-09-2021 | Restructure |
| C002 | Gary Lin | Manager | 01-07-2021 | 03-07-2021 | Marriage | 15-09-2021 | Restructure |
EDIT: My mistake in the table naming
CodePudding user response:
you can use a query like below where you use row_number to get the latest inside an inner query for each table and NOT exist for filtering out
select
A.EmployeeID, A.EmployeeName, A.Title,
B.LeaveStart, B.LeaveEnd, B.LeaveType,
C.Date, C.LatestActivity
from TableA A
left join (
select *,
rn = row_number() over (partition by EmployeeID order by LeaveStart desc)
from TableB
) B
on B.EmployeeID=A.EmployeeID and B.rn=1
left join (
select *,
rn = row_number() over (partition by EmployeeID order by ActivityDate desc)
from TableC
) C
on C.EmployeeID=A.EmployeeID and C.rn=1
where not exists
(select 1 from TableC CC where CC.EmployeeID=A.EmployeeID and CC.LatestActivity='Demoted')
CodePudding user response:
You can use a LATERAL OUTER JOIN query for this (in SQL Server it is called OUTER APPLY). Rough outline of the query:
SELECT *
FROM employee
OUTER APPLY (
SELECT TOP 1 *
FROM leaves
WHERE EmployeeID = employee.EmployeeID
ORDER BY LeaveStart DESC
) AS oa1
OUTER APPLY (
SELECT TOP 1 *
FROM activity
WHERE EmployeeID = employee.EmployeeID AND LatestActivity <> 'Demoted'
ORDER BY ActivityDate DESC
) AS oa2
