I have a dataframe with two columns A and B,
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A':[1,3,7,19,80,120,14,2],
'B':['years','years','months','months','days','days','months','years',]
})
I want change the value of with uniform measurement of 'years' as in:
df = pd.DataFrame({
'A':[1,3,0.58,1.58,0.22,0.33,1.17,2],
'B':['years','years','years','years','years','years','years','years']
})
I tried the with following code but I get (ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().)
for x in df.B:
if x== 'days':
df['A']=df['A'].div(365).round(2)
elif x== 'months':
df['A']=df['A'].div(12).round(2)
else:
pass
CodePudding user response:
You can use np.select to assign values, where for rows corresponding to "B"=="months", divide the "A" value by 12, for rows corresponding to "B"=="days", divide the "A" value by 365, and all other cases, leave as is:
import numpy as np
df['A'] = np.select([df['B']=='months', df['B']=='days'], [df['A'].div(12), df['A'].div(365)], df['A']).round(2)
df['B'] = 'years'
Output:
A B
0 1.00 years
1 3.00 years
2 0.58 years
3 1.58 years
4 0.22 years
5 0.33 years
6 1.17 years
7 2.00 years
CodePudding user response:
The more pandorable way by far was provided by @enke, but I thought I would include a fix of your approach to illustrate the preferred way to iterate over a df. Rather than just a regular for loop, you want to use one of pandas builtin methods such as iterrows. Notice that this uses two iteration variables instead of just one. However, before iterating this way, it's advisable to use a vectorized approach if possible since it's much faster.
years_list = []
for idx, row in df.iterrows():
if row['B'] == 'days':
years_list.append(row['A'] / 365)
elif row['B'] == 'months':
years_list.append(row['A'] / 12)
elif row['B'] == 'years':
years_list.append(row['A'])
else:
pass
df['years'] = years_list
df = df.round(2)
