Is there a way to reassign values in a pandas dataframe using the .apply() method?
I have this code:
import pandas as pd
df = pd.DataFrame({'switch': ['ON', 'OFF', 'ON'],
'value': [10, 15, 20]})
print (df, '\n')
def myfunc(row):
if row['switch'] == 'ON':
row['value'] = 500
elif row['switch'] == 'OFF':
row['value'] = 0
df = df.apply(myfunc, axis=1)
print (df)
The code is not working. I am trying to achieve the following output after running the .apply() method:
switch value
0 ON 500
1 OFF 0
2 ON 500
Why is the "row['value'] = 500" assignment not working and how can I rewrite it to make it work?
CodePudding user response:
its not working because your function needs to return the value. also, you need to assign it back to the dataframe column for it to be present.
def f(row):
if row['switch'] == 'ON':
return 500
elif row['switch'] == 'OFF':
return 0
df['value'] = df.apply(f, axis=1)
df now has the values:
switch value
0 ON 500
1 OFF 0
2 ON 500
one thing to note here is whether switch can have any other values other than ON and OFF.
- if those are the only permitted values, then you may replace the named function with a lambda expression.
- if other values are present, then they will currently be set to
Nonesince your if condition block does not handle them. You would need to set avaluefor every type ofswitchor a default value to end up with a data frame withoutNoneinvalue
CodePudding user response:
In addition to you not returning the value which is causing the error, I would suggest that you do not use apply() instead use a vectorized version using np.where() which is much faster.
import numpy as np
df['value'] = np.where(df['switch'] == "ON", 500, 0)
CodePudding user response:
you can write if..else in lambda like below:
>>> df['value'] = df['switch'].apply(lambda x : 500 if x == 'ON' else 0)
>>> df
switch value
0 ON 500
1 OFF 0
2 ON 500
If you want to write function try this:
def myfunc(x):
if x == 'ON':
return 500
elif x == 'OFF':
return 0
df['value'] = df['switch'].apply(myfunc)
You can use np.select and write multi condition like below:
import numpy as np
df['value'] = np.select([df['switch']=='ON',df['switch']=='OFF'], [500,0])
