I have a dataframe such as:
df = pd.read_excel(io = "☆ TOTAL_1320 - mydata.xlsx", sheet_name = 'Sheet1', index_col = 0)
df2 = df
EXCLUDE PERSONCONCERNED PC_DK PC_SPOUSE PC_PARENT PC_GRANDPARENT PC_CHILD PC_SIBSHIP PC_RELATIVE PC_LOVER ... WARNEMOTION2 TIME10 WARNEMOTION3 TIME11 WARNEMOTION4 WARNEMOTION4DTL TIME12 WARNSIGN_DTL EVENT_DTL EVENT_DTL_2
14 NaN 1 NaN 1.0 2.0 2.0 2.0 2.0 2.0 2.0 ... 2.0 NaN 1.0 88.0 2.0 NaN NaN NaN *** *** ** **/_x000D_\n*** ****** ** ***** ... NaN
68 NaN 1 NaN 1.0 2.0 2.0 2.0 2.0 2.0
and I made a condition like this:
condition = (df2.PERSONCONCERNED==1) & ((df2.PC_SPOUSE.isnull()==True)
& (df2.PC_PARENT.isnull()==True) & (df2.PC_GRANDPARENT.isnull()==True)
& (df2.PC_CHILD.isnull()==True) & (df2.PC_SIBSHIP.isnull()==True)
& (df2.PC_RELATIVE.isnull()==True) & (df2.PC_LOVER.isnull()==True)
& (df2.PC_FRIEND.isnull()==True) & (df2.PC_STRANGER.isnull()==True)
& (df2.PC_ETC.isnull()==True))
when I do df2[condition]['PC_DK'] I get this:
9089 1.0
20082 NaN
20087 NaN
20090 NaN
20096 NaN
30957 NaN
34196 NaN
81293 NaN
104381 88.0
Name: PC_DK, dtype: float64
I want to replace NaN values and 1 value to 88 in df2(which is the original dataframe)
I could replace them in new dataframe by doing
df_condition = df2[condition]['PC_DK']
df_condition = df_condition.replace(np.nan, 88)
df_condition = df_condition.replace(1,88)
but I really don't know how to do it with original dataframe.
CodePudding user response:
Before getting started, first thing is isnull() itself returns Boolean value so no need to compare it against True , second thing is as you want to check for null in multiple columns combined by & , you can consolidate these in to isnull() and all() passing axis parameter. So, the reduced code will look something like this:
# cols is list of columns i.e. cols = ['PC_SPOUSE', 'PC_PARENT', ..., etc.]
condition = df2.PERSONCONCERNED==1 & df2[cols].isnull().all(axis=1)
Now to replace, you can directly use the mask created above, then replace the values using loc, and as you want to add more condition to replace the values, you can create the extra mask and again combine with &:
extra_mask = df2['PC_DK'].isna() | df2['PC_DK'].eq(1)
df2[condition & extra_mask, 'PC_DK'] = 88
