I am trying to join to 2 tables as seen in this example. The code I am using returns all combination of the join (too many rows for this exercise) . I only want it to return the same number of rows of as my table a and just to show if it found a match from table b. E.g. If table a has 2 rows of same date and amount, table b should also have 2 rows of same post_date and trans_amount for a 'perfect' match.
can someone help please?
Code I am using:
proc sql;
create table test as
select a.*,
b.post_date, b.trans_amnt
from ( select *
from table2
) as a
left join ( select post_date,
trans_amnt
from table1
) as b on a.date = b.post_date
and a.amount = b.trans_amnt
; quit;
CodePudding user response:
Your problem statement is
I want all the rows from table A and only matching rows from table B
Correct?
A left [outer] join
select t1.*, t2.*
from table_1 t1
left join table_2 t2 on <some join criteria>
will return at least one row for every row in table_1. Based on the join criteria, any row in table_1 with no matching row in table_2 will have all the t2.* columns in the result set as null. If it has matching rows in table_2, again based on the join criteria. That row will be repeated once for every such match, with the t2.* columns containing the matching row from table_2.
Without knowing what the primary key of each table is (and/or any alternate keys), and what the foreign keys are for each table, it is impossible to tell you what the cardinality of the result set will be.
On the face of it, your join criteria seems almost assuredly incorrect: Do date and transaction amount really a part of a foreign key or primary key? date amount is almost certainly not unique. Multiple transactions could easily occur for the same amount on the same day.
If there are transactions and post dates involved, it seems likely that there are customer IDs, transaction IDs, and the like that correlate the 2 tables.
I would, at the very least simplify you select thus:
select a.*,
b.post_date,
b.trans_amnt
from table2 a
left join table1 b on a.date = b.post_date
and a.amount = b.trans_amnt
CodePudding user response:
You could just try this, so that only distinct pairs of (post_date, trans_amnt) are in the right table, to avoid multiple matches.
Now the nulls are preserved.
create table test as
select a.*
, b.post_date, b.trans_amnt -- if you want to see the missing cases clearly
from table2 AS a
left join (
select DISTINCT post_date, trans_amnt from table1
) as b
on a.date = b.post_date
and a.amount = b.trans_amnt
;
Note: SELECT DISTINCT in your original SQL would also do the trick. But you can simplify the SQL to the above.
