Home > Enterprise >  SQL left join without duplicate rows
SQL left join without duplicate rows

Time:02-01

I am trying to join to 2 tables as seen in this example. The code I am using returns all combination of the join (too many rows for this exercise) . I only want it to return the same number of rows of as my table a and just to show if it found a match from table b. E.g. If table a has 2 rows of same date and amount, table b should also have 2 rows of same post_date and trans_amount for a 'perfect' match.

can someone help please?

Code I am using:

proc sql;
    create table test as
        select a.*,
               b.post_date, b.trans_amnt 
        from      ( select *
                    from table2
                  ) as a
        left join ( select post_date,
                    trans_amnt
                    from table1
                  ) as b on a.date   = b.post_date
                        and a.amount = b.trans_amnt
; quit;

CodePudding user response:

Your problem statement is

I want all the rows from table A and only matching rows from table B

Correct?

A left [outer] join

select t1.*, t2.*
from      table_1 t1
left join table_2 t2 on <some join criteria>

will return at least one row for every row in table_1. Based on the join criteria, any row in table_1 with no matching row in table_2 will have all the t2.* columns in the result set as null. If it has matching rows in table_2, again based on the join criteria. That row will be repeated once for every such match, with the t2.* columns containing the matching row from table_2.

Without knowing what the primary key of each table is (and/or any alternate keys), and what the foreign keys are for each table, it is impossible to tell you what the cardinality of the result set will be.

On the face of it, your join criteria seems almost assuredly incorrect: Do date and transaction amount really a part of a foreign key or primary key? date amount is almost certainly not unique. Multiple transactions could easily occur for the same amount on the same day.

If there are transactions and post dates involved, it seems likely that there are customer IDs, transaction IDs, and the like that correlate the 2 tables.

I would, at the very least simplify you select thus:

select a.*,
       b.post_date,
       b.trans_amnt 
from      table2 a
left join table1 b on a.date   = b.post_date
                  and a.amount = b.trans_amnt

CodePudding user response:

You could just try this, so that only distinct pairs of (post_date, trans_amnt) are in the right table, to avoid multiple matches.

Now the nulls are preserved.

create table test as
  select a.*
       , b.post_date, b.trans_amnt   -- if you want to see the missing cases clearly
    from table2 AS a
    left join (
            select DISTINCT post_date, trans_amnt from table1
         ) as b
      on a.date   = b.post_date
     and a.amount = b.trans_amnt
;

Note: SELECT DISTINCT in your original SQL would also do the trick. But you can simplify the SQL to the above.

  •  Tags:  
  • Related