Home > database >  pandas merge df based of 2 column value
pandas merge df based of 2 column value

Time:01-25

I have 2 df. the first one is list of id and for each id there list of days 01-01-2013 until 12-31-2013 . I have 30 unique id for example, df1:

id    date
1      01-01-2013
1      01-02-2013
1      01-03-2013
....
1      12-31-2013
2      01-01-2013
2      01-02-2013
....
2      12-31-2013

my second df is id, sub_ id (each sub_id contain to some id )dates of diagnosis and label for disease- 0 or 1. for example:

id   sub_id      date    disease
1     345      01-01-2013         0
1     967       01-01-2013        1
2      843     06-03-2013        0

and so on. I want to merge the df base on id and date. so I will get consecutive days of all days in the year for each id with label of disease. I don't care about the sub_id, HOWEVER if I have 2 same dates, and I have at least 1 sick sub_id I want to return label of 1 in that day. and if there is no value, fill nan with 0 for example: df2

id    date          disease
1      01-01-2013     1
1       01-02-2013    0
1        01-13-2013    0
.....

I have tried this code:

df2.sort_values('disease').drop_duplicates(subset='date', keep='last').merge(df1,on=['id','date'],how='right')

but some id with label 1 its results label of 0 thanks

CodePudding user response:

I'd use groupby:

df2.groupby(['id','date']).max().reset_index().merge(df1,on=['id','date'],how='right')
  •  Tags:  
  • Related