Home > Back-end >  np.where( ) not changing all the values
np.where( ) not changing all the values

Time:01-18

df['view'] = np.where(df['view']==0, 'No view', df['view'])
df['view'] = np.where(df['view']==1, 'Mediocore', df['view'])
df['view'] = np.where(df['view']==2, 'Average', df['view'])
df['view'] = np.where(df['view']==3, 'Good', df['view'])
df['view'] = np.where(df['view']==4, 'Very good', df['view'])

i have a dataset of housing prices where the 'view' column rates the view on a scale of 1-4. I wanted to change those into string using this operation however this bit of code ONLY works for the first line.

df['view'].value_counts()

-No view    4140
 2           205
 3           116
 4            70
 1            69
 Name: view, dtype: int64

how do i make np.where() work for the rest of them as well?

CodePudding user response:

Numpy will convert the arrays to strings on the first call, meaning you no longer have 1 in the column, but '1' instead.

This code would work:

df['view'] = np.where(df['view']==0, 'No view', df['view'])
df['view'] = np.where(df['view']=='1', 'Mediocore', df['view'])
df['view'] = np.where(df['view']=='2', 'Average', df['view'])
df['view'] = np.where(df['view']=='3', 'Good', df['view'])
df['view'] = np.where(df['view']=='4', 'Very good', df['view'])

I know you asked how to get numpy's "where" to work, which I think the above does. But it's worth mentioning that pandas apply might work well here too , mapping with dictionaries as laid out it the other answer would be even better.

CodePudding user response:

Try mapping values instead to avoid dtype troubles:

mappings = {0: 'No view', 1: 'Mediocre', 2: 'Average', 3: 'Good', 4: 'Very good'}

df['view'] = df['view'].map(mappings)
print(df)

# Output
        view
0  Very good
1    No view
2   Mediocre
3   Mediocre
4    No view
5    No view
6    Average
7    No view
8    No view
9   Mediocre

Setup I used:

import pandas as pd
import numpy as np

np.random.seed(2022)
df = pd.DataFrame({'view': np.random.randint(0, 5, 10)})
print(df)

# Output
   view
0     4
1     0
2     1
3     1
4     0
5     0
6     2
7     0
8     0
9     1

Update

After the first np.where, your data looks like:

>>> np.where(df['view']==0, 'No view', df['view'])
array(['4', 'No view', '1', '1', 'No view', 'No view', '2', 'No view',
       'No view', '1'], dtype='<U21')

The remain values 1, 2, 3, 4 (integer) became '1', '2', '3', '4' (strings). So, you can't check integer values now. That's why, it's important to process your data in a single pass.

CodePudding user response:

As described by the other answers, arrays will be converted to strings after your first line of code, however, you can re-arrange your code to execute everything at once, which should solve the issue.

For multiple condition evaluation you might find it easier to go with numpy.select

v = df['view']

# numpy.where
df['view'] = np.where(v==0, 'No view',
         np.where(v==1,'Mediocre',
                  np.where(v==2,'Average',
                           np.where(v==3,'Good',
                                    np.where(v==4,'Very good',df['view'])))))

# numpy.select
vals  = [v.eq(0), v.eq(1),v.eq(2),v.eq(3),v.eq(4)]
new_vals     = ['No view','Mediocre','Average','Good','Very good']
df['view'] = np.select(vals, new_vals, default=v)
  •  Tags:  
  • Related