I've created a function - set of conditions, which returns 1 / 0, if the condition is fulfilled or not.
avg_ActivityScore = company['ActivityScore'].median()
min_EmployeeLowerBound = 10
list_LegalFormIDs = [112, 121, 301, 118, 141, 703, 111, 705, 921, 117, 361, 391, 711]
min_CompaniesCount = 10
def flag_company(df):
if (df['ActivityScore'] >= avg_ActivityScore):
return 1
elif (df['EmployeeLowerBound'] >= min_EmployeeLowerBound):
return 1
elif (df['LegalFormID'].isin(list_LegalFormIDs)):
return 1
else:
return 0
Then I'm applying the function on the DataFrame as follows:
df['Flag'] = df.apply(flag_company, axis = 1)
However, it returns an error message - int' object has no attribute 'isin'. Any ideas what could I change to keep the functionality, please?
If I use the below code, it works without any issues:
df.loc[df['LegalFormID'].isin(list_LegalFormIDs)]
Many thanks!
CodePudding user response:
Working with scalars in DataFrame.apply, so cannot use functions for Series, because df['LegalFormID'] is scalar inside function:
def flag_company(df):
print (df['ActivityScore'])
if (df['ActivityScore'] >= avg_ActivityScore):
return 1
elif (df['EmployeeLowerBound'] >= min_EmployeeLowerBound):
return 1
#check scalar by in
elif (df['LegalFormID'] in list_LegalFormIDs):
return 1
else:
return 0
Vectorized solution working with Series is:
m1 = df['ActivityScore'] >= avg_ActivityScore
m2 = df['EmployeeLowerBound'] >= min_EmployeeLowerBound
m3 = df['LegalFormID'].isin(list_LegalFormIDs)
df['Flag'] = (m1 | m2 | m3).astype(int)
