Home > Blockchain >  Pandas fill not all nan in 2 concated date frames with different timestamp
Pandas fill not all nan in 2 concated date frames with different timestamp

Time:01-16

I have 2 data frames one with frequent entries. I would like to concat them and fill NaN in less frequent last entry, but if the last entry was NaN, I would like to fill with NaN

Example:

df = pd.DataFrame(data=[4.5, 4.6, 5.7, 5.7, 6.7, 4, 9.0],
              index=list(map(pd.to_datetime, ['00:00', '00:30', '01:00', '01:30', '02:00', '02:30', '03:00'])),
              columns=['frequent data'])

df2 = pd.DataFrame(data=[4.5, np.NaN, 5.7, np.NaN],
              index=list(map(pd.to_datetime, ['00:00', '01:00', '02:00', '03:00'])),
              columns=['data'])

df2
                         frequent data  data
2022-01-15 00:00:00            4.5   4.5
2022-01-15 01:00:00            5.7   NaN
2022-01-15 02:00:00            6.7   5.7
2022-01-15 03:00:00            9.0   NaN

new_df = pd.concat((df, df2), axis=1)
new_df
                     frequent data  data
2022-01-15 00:00:00            4.5   4.5
2022-01-15 00:30:00            4.6   NaN
2022-01-15 01:00:00            5.7   NaN
2022-01-15 01:30:00            5.7   NaN
2022-01-15 02:00:00            6.7   5.7
2022-01-15 02:30:00            4.0   NaN
2022-01-15 03:00:00            9.0   NaN

I would like to achieve such a date frame

                     frequent data  data
2022-01-15 00:00:00            4.5   4.5
2022-01-15 00:30:00            4.6   4.5
2022-01-15 01:00:00            5.7   NaN
2022-01-15 01:30:00            5.7   NaN
2022-01-15 02:00:00            6.7   5.7
2022-01-15 02:30:00            4.0   5.7
2022-01-15 03:00:00            9.0   NaN

Is there any easy way for this or do I need to write my function for this?

CodePudding user response:

IIUC:

df2 = df2.reindex(df.index).groupby(lambda x: x.floor('H')).ffill()

new_df =  pd.concat([df, df2], axis=1)
print(new_df)

# Output
                     frequent data  data
2022-01-15 00:00:00            4.5   4.5
2022-01-15 00:30:00            4.6   4.5
2022-01-15 01:00:00            5.7   NaN
2022-01-15 01:30:00            5.7   NaN
2022-01-15 02:00:00            6.7   5.7
2022-01-15 02:30:00            4.0   5.7
2022-01-15 03:00:00            9.0   NaN

You can also fillna after concat:

new_df = pd.concat([df, df2], axis=1).groupby(lambda x: x.floor('H')).ffill()
  •  Tags:  
  • Related