I have 2 dataframes that contain 3 account indictors per account number. The account numbers are like for like in the column labelled "account". I am trying to modify dataframe 2 so it matches dataframe 1 in terms of having the same NaN values for each column.
Dataframe 1:
import pandas as pd
import numpy as np
df = pd.DataFrame([[1234567890,1,np.nan,'G'],
[7854567890,np.nan,100,np.nan],
[7854567899,np.nan,np.nan,np.nan],
[7854567893,np.nan,100,np.nan],
[7854567893,np.nan,np.nan,np.nan],
[9632587415,np.nan,np.nan,'B']],
columns = ['account','ind_1','ind_2','ind_3'])
df
Output:
account ind_1 ind_2 ind_3
0 1234567890 1.0 NaN G
1 7854567890 NaN 100.0 NaN
2 7854567899 NaN NaN NaN
3 7854567893 NaN 100.0 NaN
4 7854567893 NaN NaN NaN
5 9632587415 NaN NaN B
Dataframe 2:
df2 = pd.DataFrame([[1234567890,5,np.nan,'GG'],
[7854567890,1,106,np.nan],
[7854567899,np.nan,100,'N'],
[7854567893,np.nan,100,np.nan],
[7854567893,np.nan,np.nan,np.nan],
[9632587415,3,np.nan,'B']],
columns = ['account','ind_1','ind_2','ind_3'])
df2
Output:
account ind_1 ind_2 ind_3
0 1234567890 5.0 NaN GG
1 7854567890 1.0 106.0 NaN
2 7854567899 NaN 100.0 N
3 7854567893 NaN 100.0 NaN
4 7854567893 NaN NaN NaN
5 9632587415 3.0 NaN B
Problem: I need to change dataframe 2 so that it is matching the same NaN values in dataframe 1.
For example: Column ind_1 has values in index 0, 1 and 5 in df2, whereas in df1 it only has values in index 0. I need to replace the values in index 1 and 5 in df2 with NaNs to match the same number of Nans in df1. Same logic to applied to the other 2 columns.
Expected outcome:
account ind_1 ind_2 ind_3
0 1234567890 5.0 NaN GG
1 7854567890 NaN 106.0 NaN
2 7854567899 NaN NaN NaN
3 7854567893 NaN 100.0 NaN
4 7854567893 NaN NaN NaN
5 9632587415 NaN NaN B
Is there any easy way to achieve this?
Thanks in advance Alan
CodePudding user response:
Try this
df2[~df.isna()]
df.isna() check where df has nan value and create a mask
then slice df2 with the mask.
CodePudding user response:
You return a DataFrame with True values where there are NaN values in your first dataframe using df.isna(). You can use this as a Boolean mask to set the correct cells to NaN in your second DataFrame:
df2[df.isna()] = None
Be careful: I assume that you also need to take care that these NaNs will be associated with rows with the same value for account. This solution does not ensure that this will happen, i.e. it will assume that the values in account column are exactly int he same order in both dataframes.
