Home > Blockchain >  Fill NaN from another adataframe based on a column
Fill NaN from another adataframe based on a column

Time:02-05

I have one dataframe with shape (23251, 8) and another dataframe with shape (3652, 14). The DATE column in the first dataframe contains dates from 1955-01-01 up to 2020-12-31 and the DATA column in the second contains dates from 2010-01-01 to 2019-12-31.In the first dataframe the majority of the columns contains few or many missing values. I want to fill the missing values in the in the first dataframe(whenever is possible) from the second dataframe based on the DATE(where the DATE in the second dataframe matches the DATE in the first dataframe ).

The first Dataframe:

enter image description here

The second Dataframe:

enter image description here

To make it clear I want, if the rows(from the first dataframe) that refer to date from 2010-01-01 to 2019-12-31 contain NaN in columns PRCP, TAVG, TMAX and in TMIN to be filled with the values from the second dataframe based on the critirion that the DATE in each row matches betwwen the two dataframes.

CodePudding user response:

Without sample as plain text data, it's difficult to help you. Maybe this should work:

COLS = ['TMIN', 'TMAX']
df1 = df1.fillna(df2.set_index('DATE').reindex(df1['DATE'])[COLS] \
         .reset_index(drop=True))
print(df1)

# Output
         DATE NAME  TMIN  TMAX
0  1955-01-01    L  28.0  40.0
1  1955-01-02    L  27.0  41.0
2  1955-01-03    L   NaN   NaN
3  1955-01-01    M  28.0  40.0
4  1955-01-02    M  27.0  41.0
5  1955-01-03    M   NaN   NaN

Setup:

import pandas as pd
import numpy as np

d1 = {'DATE': ['1955-01-01', '1955-01-02', '1955-01-03',
               '1955-01-01', '1955-01-02', '1955-01-03'],
               'NAME': ['L', 'L', 'L', 'M', 'M', 'M'],
               'TMIN': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
               'TMAX': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}
df1 = pd.DataFrame(d1)

d2 = {'DATE': ['1955-01-01', '1955-01-02'], 'TMIN': [28, 27], 'TMAX': [40, 41]}
df2 = pd.DataFrame(d2)

print(df1)
print(df2)

# Output
         DATE NAME  TMIN  TMAX
0  1955-01-01    L   NaN   NaN
1  1955-01-02    L   NaN   NaN
2  1955-01-03    L   NaN   NaN
3  1955-01-01    M   NaN   NaN
4  1955-01-02    M   NaN   NaN
5  1955-01-03    M   NaN   NaN


         DATE  TMIN  TMAX
0  1955-01-01    28    40
1  1955-01-02    27    41
  •  Tags:  
  • Related