In today's year, if the difference in the year of the corresponding column is 5 or more, it is designed to output 1, but the NaN value comes out.
import pandas as pd
from datetime import datetime
today = datetime.today()
def time(x):
if today.year - x.year > 5:
x = 1
return x
else:
x = 0
return x
df['VIP'] = df[condition]['DaysSinceJoined'].apply(time)
df['VIP']
Get an error:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
..
2235 NaN
2236 NaN
2237 NaN
2238 NaN
2239 NaN
Name: VIP, Length: 2240, dtype: float64
CodePudding user response:
The function works just fine. The issue might lie within your initial condition:
Fist lets generate a bit sample data:
foo = pd.DataFrame({'time':['1979-11-10','1962-07-22','1987-09-16','2020-09-16']})
from datetime import datetime
today = datetime.today()
def time(x):
if today.year - x.year > 5:
return 1
else:
return 0
First we make sure it's not data format issue as I suggested above:
foo['VIP'] =foo['time'].apply(time)
'str' object has no attribute 'year'
We fix this by converting the dates to datetime:
foo['time'] = pd.to_datetime(foo['time'])
Lets test the function:
foo['VIP'] =foo['time'].apply(time)
time VIP
0 1979-11-10 1
1 1962-07-22 1
2 1987-09-16 1
3 2020-09-16 0
All good.
Now lets apply some random condition:
foo['VIP'] =foo[foo['time'].dt.year >1980]['time'].apply(time)
time VIP
0 1979-11-10 NaN
1 1962-07-22 NaN
2 1987-09-16 1.0
3 2020-09-16 0.0
Reason is that you first filter your dataframe to smaller bit and then feed those rows to your function. Because they are never processed they don't get return values.
I suggest you do this with .loc function:
foo.loc[(( today.year - foo['time'].dt.year > 5 ) & (Other_condition_here), 'vip'] = 1
foo.loc[(( today.year - foo['time'].dt.year <= 5 ) & (Other_condition_here), 'vip'] = 0
For more about .loc see documentation
CodePudding user response:
I guess when you use .apply it takes several arguments. Use map:
df['VIP'] = df[condition]['DaysSinceJoined'].map(time)
or:
df['VIP'] = df[condition].apply(lambda x: time(x['DaysSinceJoined']))
If it didn't work, show us some sample data.
CodePudding user response:
You do not need to apply a function here at all. First make sure you have a datetime series, so you can use the dt accessor. That allows you to check your condition for the whole column - the result will be a Series of boolean, i.e True/False. That can be converted to type int, which will give you 1 for True and 0 for False.
EX:
import pandas as pd
df = pd.DataFrame({'date':['1979-11-10','1962-07-22','1987-09-16','2020-09-16']})
df['date'] = pd.to_datetime(df['date'])
df['check'] = (pd.Timestamp('now').year - df['date'].dt.year > 5).astype(int)
df
date check
0 1979-11-10 1
1 1962-07-22 1
2 1987-09-16 1
3 2020-09-16 0
Avoiding the apply tends to be more efficient, as you make use of the built-in, vectorized functionality. But that actually shouldn't be your main concern; I think here, without the apply, it's just more readable.
