I have the following dataframes:
df1 = pd.DataFrame(data={'col1': ['a', 'd', 'g', 'j'],
'col2': ['b', 'c', 'i', np.nan],
'col3': ['c', 'f', 'i', np.nan],
'col4': ['x', np.nan, np.nan, np.nan]},
index=pd.Series(['ind1', 'ind2', 'ind3', 'ind4'], name='index'))
| index | col1 | col2 | col3 | col4 |
|---|---|---|---|---|
| ind1 | a | b | c | x |
| ind2 | d | c | f | NaN |
| ind3 | g | i | i | NaN |
| ind4 | j | NaN | NaN | NaN |
df2 = pd.Series(data=[True, False, True, False],
index=pd.Series(['ind1', 'ind2', 'ind3', 'ind4']))
| ind1 | True |
| ind2 | False |
| ind3 | True |
| ind4 | False |
How do I make the last 2 values for each row in df1 into NA, based on the boolean values of df2?
In this case, since ind1 and ind3 are True, it would impact the same indices in df1.
| index | col1 | col2 | col3 | col4 |
|---|---|---|---|---|
| ind1 | a | b | NaN | NaN |
| ind2 | d | c | f | NaN |
| ind3 | g | i | NaN | NaN |
| ind4 | j | NaN | NaN | NaN |
CodePudding user response:
A possible solution, based on pandas.DataFrame.mask:
df1[['col3', 'col4']] = df1[['col3', 'col4']].mask(df2)
Output:
col1 col2 col3 col4
index
ind1 a b NaN NaN
ind2 d c f NaN
ind3 g i NaN NaN
ind4 j NaN NaN NaN
CodePudding user response:
You can use boolean indexing:
N = 2
df1.iloc[df2, -N:] = np.nan
NB. what you call df2 is actually a Series, s/ser might be more appropriate as a name.
output:
col1 col2 col3 col4
index
ind1 a b NaN NaN
ind2 d c f NaN
ind3 g i NaN NaN
ind4 j NaN NaN NaN
