I have an original dataframe df0 with a number of values, based on this dataframe I have a second dateframe where some the original values are NaN, df1.
import pandas as pd
df0 = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]})
df1 = pd.DataFrame({'col1': [1,2,None,4,5,6,None,8,None,10,11,None,13,None,None]})
I need a df2 to be the inverse of df1 in terms of rows with NaN. Every row which is NaN in df1 should get its original value back from df0 and rows which are not NaN must become NaN such that I get the following dataframe:
df2 = pd.DataFrame({'col1': [None,None,3,None,None,None,7,None,9,None,None,12,None,14,15]})
What is the best way to go about this if it were a larger dataset?
CodePudding user response:
masking all columns
If you need to mask all columns, use mask notna OR where isna:
df2 = df0.mask(df1['col1'].notna())
# or
df2 = df0.where(df1['col1'].isna())
output:
col1
0 NaN
1 NaN
2 3.0
3 NaN
4 NaN
5 NaN
6 7.0
7 NaN
8 9.0
9 NaN
10 NaN
11 12.0
12 NaN
13 14.0
14 15.0
masking only "col1"
If you just need to replace col1 and leave potential other columns intact rather use assign and Series.mask:
df2 = df0.assign(col1=df0['col1'].mask(df1['col1'].notna()))
CodePudding user response:
Use Series.where with Series.isna for replace one column by another DataFrame, only necessary same index in both:
df0['col1'] = df0['col1'].where(df1['col1'].isna())
print (df0)
col1
0 NaN
1 NaN
2 3.0
3 NaN
4 NaN
5 NaN
6 7.0
7 NaN
8 9.0
9 NaN
10 NaN
11 12.0
12 NaN
13 14.0
14 15.0
Alternative with DataFrame.loc and Series.notna:
df0.loc[df1['col1'].notna(), 'col1'] = np.nan
