Home > Blockchain >  DataFrame .isin for integers
DataFrame .isin for integers

Time:01-19

I've created a function - set of conditions, which returns 1 / 0, if the condition is fulfilled or not.

avg_ActivityScore = company['ActivityScore'].median()
min_EmployeeLowerBound = 10
list_LegalFormIDs = [112, 121, 301, 118, 141, 703, 111, 705, 921, 117, 361, 391, 711]
min_CompaniesCount = 10

def flag_company(df):
    if (df['ActivityScore'] >= avg_ActivityScore):
        return 1
    elif (df['EmployeeLowerBound'] >= min_EmployeeLowerBound):
        return 1
    elif (df['LegalFormID'].isin(list_LegalFormIDs)):
        return 1
    else:
        return 0

Then I'm applying the function on the DataFrame as follows:

df['Flag'] = df.apply(flag_company, axis = 1)

However, it returns an error message - int' object has no attribute 'isin'. Any ideas what could I change to keep the functionality, please?

If I use the below code, it works without any issues:

df.loc[df['LegalFormID'].isin(list_LegalFormIDs)]

Many thanks!

CodePudding user response:

Working with scalars in DataFrame.apply, so cannot use functions for Series, because df['LegalFormID'] is scalar inside function:

def flag_company(df):
    print (df['ActivityScore'])
    
    if (df['ActivityScore'] >= avg_ActivityScore):
        return 1
    elif (df['EmployeeLowerBound'] >= min_EmployeeLowerBound):
        return 1

    #check scalar by in
    elif (df['LegalFormID'] in list_LegalFormIDs):
        return 1
    else:
        return 0

Vectorized solution working with Series is:

m1 = df['ActivityScore'] >= avg_ActivityScore
m2 = df['EmployeeLowerBound'] >= min_EmployeeLowerBound
m3 = df['LegalFormID'].isin(list_LegalFormIDs)

df['Flag'] = (m1 | m2 | m3).astype(int)
  •  Tags:  
  • Related