I have two DataFrames like that below:
df_1 = pd.DataFrame({
'wag': [111, 111, 222],
'date': ['2019-08-30', '2019-09-20', '2019-08-10']
})
and
df_2 = pd.DataFrame({
'wag': [111, 111, 111, 222, 222, 333, 333, 333],
'msr_date': ['2019-08-30', '2019-08-30', '2019-08-30',
'2019-08-30', '2019-08-30', '2019-08-30',
'2019-08-30', '2019-08-30'],
'measurements': [1, 2, 3, 4, 5, 6, 7, 8]
})
df_1 and df_2 are below:
| wag | date |
|---|---|
| 111 | 2019-08-30 |
| 111 | 2019-09-20 |
| 222 | 2019-08-10 |
| wag | msr_date | measurements |
|---|---|---|
| 111 | 2019-03-29 | 1 |
| 111 | 2019-03-19 | 2 |
| 111 | 2019-03-10 | 3 |
| 222 | 2019-09-30 | 4 |
| 222 | 2019-09-20 | 5 |
| 333 | 2019-08-30 | 6 |
| 333 | 2019-08-30 | 7 |
| 333 | 2019-09-20 | 8 |
So, I have two DataFrames df_1 and df_2 and I want to iterate through them at the same time and filter df_2. I need to iterate based on wag in both tables.
If wag in df_1 is in df_2 then I have to compare date and mrs_date and
- if
mrs_date <= datethen I append all rows of thatwag - else I check if I have one more condition for that row in
df_1for thatwagand again repeat the comparison of dates. - When conditions / rows are finished for that wags we go to the next value of
wag.
For example, here I first check the first row of df_1 and if mrs_date <= date in df_2 then I append all rows to df_new. If not then I go to the next date for that wag. If we have one if not then check another wag. So finally, for this example I will get df_new as below:
| wag | msr_date | measurements |
|---|---|---|
| 111 | 2019-03-29 | 1 |
| 111 | 2019-03-19 | 2 |
| 111 | 2019-03-10 | 3 |
In reality, there are so many values in df_1 and in df_2 so for_loop should be quite complicated and I couldn't find the way how to do it. If it is not clear how I described the task please ask.
CodePudding user response:
Merge your 2 dataframes on wag column then keep only rows that match your condition msr_date <= date. Finally, drop the column date from df_1 and remove duplicates rows.
>>> df_2.merge(df_1, on='wag', how='left') \
.query('msr_date <= date') \
.drop(columns='date').drop_duplicates()
wag msr_date measurements
0 111 2019-03-29 1
2 111 2019-03-19 2
4 111 2019-03-10 3
CodePudding user response:
Using a for loop:
new_df = pd.DataFrame(data=None, columns=df_2.columns)
for idx, row in df_1.iterrows():
for idx2, row2 in df_2.iterrows():
if row['wag'] == row2['wag'] and row2['msr_date'] <= row['date']:
new_df = new_df.append(row2).drop_duplicates()
