Home > Back-end >  Pandas : When the 'apply' function is applied to the column, the 'NaN' value is
Pandas : When the 'apply' function is applied to the column, the 'NaN' value is

Time:02-04

In today's year, if the difference in the year of the corresponding column is 5 or more, it is designed to output 1, but the NaN value comes out.

import pandas as pd
from datetime import datetime

today = datetime.today()

def time(x):
  if today.year - x.year > 5:
    x = 1
    return x
  else:
    x = 0
    return x

df['VIP'] = df[condition]['DaysSinceJoined'].apply(time)
df['VIP']

Get an error:

0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
        ..
2235   NaN
2236   NaN
2237   NaN
2238   NaN
2239   NaN
Name: VIP, Length: 2240, dtype: float64

CodePudding user response:

The function works just fine. The issue might lie within your initial condition:

Fist lets generate a bit sample data:

foo = pd.DataFrame({'time':['1979-11-10','1962-07-22','1987-09-16','2020-09-16']})

from datetime import datetime
today = datetime.today()

def time(x):
   if today.year - x.year > 5:
      return 1
   else:
      return 0

First we make sure it's not data format issue as I suggested above:

foo['VIP'] =foo['time'].apply(time)

'str' object has no attribute 'year'

We fix this by converting the dates to datetime:

foo['time'] = pd.to_datetime(foo['time'])

Lets test the function:

foo['VIP'] =foo['time'].apply(time)

time VIP
0 1979-11-10 1
1 1962-07-22 1
2 1987-09-16 1
3 2020-09-16 0

All good.

Now lets apply some random condition:

foo['VIP'] =foo[foo['time'].dt.year >1980]['time'].apply(time)

time VIP
0 1979-11-10 NaN
1 1962-07-22 NaN
2 1987-09-16 1.0
3 2020-09-16 0.0

Reason is that you first filter your dataframe to smaller bit and then feed those rows to your function. Because they are never processed they don't get return values.

I suggest you do this with .loc function:

foo.loc[(( today.year - foo['time'].dt.year > 5 ) & (Other_condition_here), 'vip'] = 1
foo.loc[(( today.year - foo['time'].dt.year <= 5 ) & (Other_condition_here), 'vip'] = 0

For more about .loc see documentation

CodePudding user response:

I guess when you use .apply it takes several arguments. Use map:

df['VIP'] = df[condition]['DaysSinceJoined'].map(time)

or:

df['VIP'] = df[condition].apply(lambda x: time(x['DaysSinceJoined']))

If it didn't work, show us some sample data.

CodePudding user response:

You do not need to apply a function here at all. First make sure you have a datetime series, so you can use the dt accessor. That allows you to check your condition for the whole column - the result will be a Series of boolean, i.e True/False. That can be converted to type int, which will give you 1 for True and 0 for False.

EX:

import pandas as pd

df = pd.DataFrame({'date':['1979-11-10','1962-07-22','1987-09-16','2020-09-16']})

df['date'] = pd.to_datetime(df['date'])

df['check'] = (pd.Timestamp('now').year - df['date'].dt.year > 5).astype(int)

df
        date  check
0 1979-11-10      1
1 1962-07-22      1
2 1987-09-16      1
3 2020-09-16      0

Avoiding the apply tends to be more efficient, as you make use of the built-in, vectorized functionality. But that actually shouldn't be your main concern; I think here, without the apply, it's just more readable.

  •  Tags:  
  • Related