How to set indicator value equal to TRUE if multiple column values are NULL?-CodePudding

I'm trying to set the values in my indicator column 'match' equal to TRUE if the values in the 'State_db', 'Year_db', and 'Type_db' columns are all NULL. Another condition my code applies is checking if 'State_j'== 'State_db', 'Year_j'=='Year_db', and 'Type_j'=='Type_db', but this works just fine.

This is what my dataframe looks like:

State_j Year_j  Type_j  File                  State_db   Year_db     Type_db
NY      2020    ICF     2020_ICF_R2_NY_01     NaN        NaN         NaN
WA      2020    OP      2020_OP_R10_WA_01     WA         2020.0      IP
WA      2020    IP      2020_IP_R10_WA_01     WA         2020.0      IP
NJ      2021    ICF     2021_ICF_R2_NJ_01     NY         2022.0      ICF
CA      2021    PRTF    2021_PRTF_R2_CA_02    CA         2021.0      NaN
MI      2021    PRTF    2021_PRTF_R2_MI_02    NaN        2021.0      NaN

This is what I want the final dataframe to look like:

State   Year    Type   File                State_db  Year_db    Type_db  match
NY      2020    ICF    2020_ICF_R2_NY_01   NaN       NaN        NaN      True
WA      2020    OP     2020_OP_R10_WA_01   WA        2020.0     IP       False
WA      2020    IP     2020_IP_R10_WA_01   WA        2020.0     IP       True
NJ      2021    ICF    2021_ICF_R2_NJ_01   NY        2022.0     ICF      False
CA      2021    PRTF   2021_PRTF_R2_CA_02  CA        2021.0     NaN      False
MI      2021    PRTF   2021_PRTF_R2_MI_02  NaN       2021.0     NaN      False

This is my code:

# Create columns to check for matching state, year, and service type cols
def assemble_report(df):  
    matches_cols = []
    for col in ['State', 'Year', 'Type']:
        df[f'{col}_match'] = (df[f'{col}_j'] == df[f'{col}_db'])  
        matches_cols.append(f'{col}_match') 

    # Create indicator column for whether all keys matched
    df['match'] = df[matches_cols].all(axis='columns')
    df.loc[(df[f'{col}_db'].isnull().all(axis='columns')), 'match'] = "True"
    
    return matches_cols

Most of the code works as properly, but it's this bit that's giving me issues:

df.loc[(df[f'{col}_db'].isnull().all(axis='columns')), 'match'] = "True"

Basically, I want the indicator value 'match' to equal TRUE if the values in 'State_db', 'Year_db', and 'Type_db' are all NULL.

I'm fairly certain the piece of code above is triggering this error:

ValueError: No axis named columns for object type <class 'type'>

What I'm doing wrong and how can I fix this?

CodePudding user response：

def check_null(x):
    return x['State_db'] == 'NaN' and x['Year_db'] == 'NaN' and x['Type_db'] == 'NaN'


def check_cond(x):
    return x['State_j'] == x['State_db'] and float(x['Year_j']) == float(x['Year_db']) and x['Type_j'] == x['Type_db']


df['match'] = df.apply(lambda x: check_null(x) or check_cond(x), axis=1)

Output:

    State_j     Year_j  Type_j  File                State_db    Year_db     Type_db     match
1   NY          2020    ICF     2020_ICF_R2_NY_01   NaN         NaN     NaN     True
2   WA          2020    OP      2020_OP_R10_WA_01   WA          2020.0  IP      False
3   WA          2020    IP      2020_IP_R10_WA_01   WA          2020.0  IP      True
4   NJ          2021    ICF     2021_ICF_R2_NJ_01   NY          2022.0  ICF     False
5   CA          2021    PRTF    2021_PRTF_R2_CA_02  CA          2021.0  NaN     False
6   MI          2021    PRTF    2021_PRTF_R2_MI_02  NaN         2021.0  NaN     False

CodePudding user response：

First, make a new dataframe to all True, False entries by df2 = df.isnull().any().any(). If the original entry is null, it is True, else False.

Then define df['match'][i] = df2['State_db'][i] and df2['Year_db'][i] and df2['Type_db'][i], using logical and operator for true and false, for i in 0 to max row entries.

CodePudding user response：

Another method, you can use lambda pd.isnull for that

df['match']=df.apply(lambda x: pd.isnull(x['State_db']) and pd.isnull(x['Year_db']) and pd.isnull(x['Type_db']), axis=1)