Home > Software design >  Replace other values with np.nan
Replace other values with np.nan

Time:02-08

I have a pandas data frame:

import pandas as pd

X = pd.DataFrame({'col1': [1,2],
                  'col2': [4,5]})

I have a replacement dictionary:

dict_replace = {
    'col1': {1:'a', 2:'b'},
    'col2': {4:'c', 5:'d'}
}

I can easily replace the values in X using:

X = X.replace(dict_replace)

Resulting in:

X = pd.DataFrame({'col1': ['a','b'],
                  'col2': ['c','d']})

However, if a new value appears in X which is not in dict_replace (for the respective column) I want it replaced with np.nan.

For example, a data frame:

X = pd.DataFrame({'col1': [1,2,3],
                  'col2': [4,5,7]})

Should look like:

X = pd.DataFrame({'col1': ['a','b',np.nan],
                  'col2': ['c','d',np.nan]})

What are some ways I can do this without having to iterate?

CodePudding user response:

You are looking for pandas.Series.map, which, though only available on columns, can be used on the whole dataframe with apply:

X = X.apply(lambda col: col.map(dict_replace[col.name]))

Output:

>>> X
  col1 col2
0    a    c
1    b    d
2  NaN  NaN

CodePudding user response:

Try with mask

out = X.replace(dict_replace).mask(lambda x : x==X)
Out[215]: 
  col1 col2
0    a    c
1    b    d
2  NaN  NaN
  •  Tags:  
  • Related