df['view'] = np.where(df['view']==0, 'No view', df['view'])
df['view'] = np.where(df['view']==1, 'Mediocore', df['view'])
df['view'] = np.where(df['view']==2, 'Average', df['view'])
df['view'] = np.where(df['view']==3, 'Good', df['view'])
df['view'] = np.where(df['view']==4, 'Very good', df['view'])
i have a dataset of housing prices where the 'view' column rates the view on a scale of 1-4. I wanted to change those into string using this operation however this bit of code ONLY works for the first line.
df['view'].value_counts()
-No view 4140
2 205
3 116
4 70
1 69
Name: view, dtype: int64
how do i make np.where() work for the rest of them as well?
CodePudding user response:
Numpy will convert the arrays to strings on the first call, meaning you no longer have 1 in the column, but '1' instead.
This code would work:
df['view'] = np.where(df['view']==0, 'No view', df['view'])
df['view'] = np.where(df['view']=='1', 'Mediocore', df['view'])
df['view'] = np.where(df['view']=='2', 'Average', df['view'])
df['view'] = np.where(df['view']=='3', 'Good', df['view'])
df['view'] = np.where(df['view']=='4', 'Very good', df['view'])
I know you asked how to get numpy's "where" to work, which I think the above does. But it's worth mentioning that pandas apply might work well here too , mapping with dictionaries as laid out it the other answer would be even better.
CodePudding user response:
Try mapping values instead to avoid dtype troubles:
mappings = {0: 'No view', 1: 'Mediocre', 2: 'Average', 3: 'Good', 4: 'Very good'}
df['view'] = df['view'].map(mappings)
print(df)
# Output
view
0 Very good
1 No view
2 Mediocre
3 Mediocre
4 No view
5 No view
6 Average
7 No view
8 No view
9 Mediocre
Setup I used:
import pandas as pd
import numpy as np
np.random.seed(2022)
df = pd.DataFrame({'view': np.random.randint(0, 5, 10)})
print(df)
# Output
view
0 4
1 0
2 1
3 1
4 0
5 0
6 2
7 0
8 0
9 1
Update
After the first np.where, your data looks like:
>>> np.where(df['view']==0, 'No view', df['view'])
array(['4', 'No view', '1', '1', 'No view', 'No view', '2', 'No view',
'No view', '1'], dtype='<U21')
The remain values 1, 2, 3, 4 (integer) became '1', '2', '3', '4' (strings). So, you can't check integer values now. That's why, it's important to process your data in a single pass.
CodePudding user response:
As described by the other answers, arrays will be converted to strings after your first line of code, however, you can re-arrange your code to execute everything at once, which should solve the issue.
For multiple condition evaluation you might find it easier to go with numpy.select
v = df['view']
# numpy.where
df['view'] = np.where(v==0, 'No view',
np.where(v==1,'Mediocre',
np.where(v==2,'Average',
np.where(v==3,'Good',
np.where(v==4,'Very good',df['view'])))))
# numpy.select
vals = [v.eq(0), v.eq(1),v.eq(2),v.eq(3),v.eq(4)]
new_vals = ['No view','Mediocre','Average','Good','Very good']
df['view'] = np.select(vals, new_vals, default=v)
