Home > Software design >  Set a value to true if there are specific values in the row that are NULL?
Set a value to true if there are specific values in the row that are NULL?

Time:02-01

I'm trying to set the values in my indicator column 'match' equal to TRUE if the values in the 'State_db', 'Year_db', and 'Type_db' columns are all NULL. Another condition my code applies is checking if 'State_j'== 'State_db', 'Year_j'=='Year_db', and 'Type_j'=='Type_db', but this works just fine.

This is what my dataframe looks like:

State_j Year_j  Type_j  File                  State_db   Year_db     Type_db                    
NY      2020    ICF     2020_ICF_R2_NY_01     NaN        NaN         NaN
WA      2020    OP      2020_OP_R10_WA_01     WA         2020.0      IP
WA      2020    IP      2020_IP_R10_WA_01     WA         2020.0      IP
NJ      2021    ICF     2021_ICF_R2_NJ_01     NY         2022.0      ICF
CA      2021    PRTF    2021_PRTF_R2_CA_02    CA         2021.0      NaN
MI      2021    PRTF    2021_PRTF_R2_MI_02    NaN        2021.0      NaN

State_match     Year_match     Type_match
False           False          False
True            True           False
True            True           True
False           False          True
True            True           False
False           True           False

This is what I want the final dataframe to look like:

State   Year  Type  File                 State_db   Year_db    Type_db 
NY      2020  ICF   2020_ICF_R2_NY_01    NaN        NaN        NaN   
WA      2020  OP    2020_OP_R10_WA_01    WA         2020.0     IP       
WA      2020  IP    2020_IP_R10_WA_01    WA         2020.0     IP       
NJ      2021  ICF   2021_ICF_R2_NJ_01    NY         2022.0     ICF      
CA      2021  PRTF  2021_PRTF_R2_CA_02   CA         2021.0     NaN      
MI      2021  PRTF  2021_PRTF_R2_MI_02   NaN        2021.0     NaN

State_match     Year_match     Type_match     match
False           False          False          True
True            True           False          False
True            True           True           True
False           False          True           False
True            True           False          False
False           True           False          False  

This is my code:

# Create columns to check for matching state, year, and service type cols
def assemble_report(df):  
    matches_cols = []
    for col in ['State', 'Year', 'Type']:
        df[f'{col}_match'] = (df[f'{col}_j'] == df[f'{col}_db'])  
        matches_cols.append(f'{col}_match') 

    # Create indicator column for whether all keys matched
    df['match'] = df[matches_cols].all(axis='columns')
    df.loc[(df[f'{col}_db'].isnull().all(axis='columns')), 'match'] = "True"
    
    return matches_cols

Most of the code works as properly, but it's this bit that's giving me issues:

df.loc[(df[f'{col}_db'].isnull().all(axis='columns')), 'match'] = "True"

Basically, I want the indicator value 'match' to equal TRUE if the values in 'State_db', 'Year_db', and 'Type_db' are all NULL.

I'm fairly certain the piece of code above is triggering this error:

ValueError: No axis named columns for object type <class 'type'>

What I'm doing wrong and how can I fix this?

CodePudding user response:

In your solution need match all db columns, so create list cols_db, so possible select all columns in df[cols_db]:

def assemble_report(df):  
    
    L =['State', 'Year', 'Type']
    cols_match = [f'{col}_match' for col in L]
    cols_db = [f'{col}_db' for col in L]

    for col in L:
        df[f'{col}_match'] = (df[f'{col}_j'] == df[f'{col}_db'])  


    # Create indicator column for whether all keys matched
    df['match'] = df[cols_match].all(axis='columns') 
    df.loc[df[cols_db].isnull().all(axis='columns'), 'match'] = True

    #Better alternative match both mask by | for bitwise OR
    #df['match'] = (df[cols_match].all(axis='columns') |
    #               df[cols_db].isnull().all(axis='columns'))
    
    return df

print (assemble_report(df))
  State_j  Year_j Type_j                File State_db  Year_db Type_db  \
0      NY    2020    ICF   2020_ICF_R2_NY_01      NaN      NaN     NaN   
1      WA    2020     OP   2020_OP_R10_WA_01       WA   2020.0      IP   
2      WA    2020     IP   2020_IP_R10_WA_01       WA   2020.0      IP   
3      NJ    2021    ICF   2021_ICF_R2_NJ_01       NY   2022.0     ICF   
4      CA    2021   PRTF  2021_PRTF_R2_CA_02       CA   2021.0     NaN   
5      MI    2021   PRTF  2021_PRTF_R2_MI_02      NaN   2021.0     NaN   

   State_match  Year_match  Type_match  match  
0        False       False       False   True  
1         True        True       False  False  
2         True        True        True   True  
3        False       False        True  False  
4         True        True       False  False  
5        False        True       False  False  

Another idea for improve solution is create all lists and compare values of j with db columns, only necessary convert to numpy arrays (because different columns names j and db):

def assemble_report(df):  
    
    L =['State', 'Year', 'Type']
    cols_match = [f'{col}_match' for col in L]
    cols_db = [f'{col}_db' for col in L]
    cols_j = [f'{col}_j' for col in L]
    
    df[cols_match] = (df[cols_j] == df[cols_db].to_numpy())  

    # Create indicator column for whether all keys matched
    df['match'] = df[cols_match].all(axis=1) | df[cols_db].isnull().all(axis=1)
    
    return df

print (assemble_report(df))
  State_j  Year_j Type_j                File State_db  Year_db Type_db  \
0      NY    2020    ICF   2020_ICF_R2_NY_01      NaN      NaN     NaN   
1      WA    2020     OP   2020_OP_R10_WA_01       WA   2020.0      IP   
2      WA    2020     IP   2020_IP_R10_WA_01       WA   2020.0      IP   
3      NJ    2021    ICF   2021_ICF_R2_NJ_01       NY   2022.0     ICF   
4      CA    2021   PRTF  2021_PRTF_R2_CA_02       CA   2021.0     NaN   
5      MI    2021   PRTF  2021_PRTF_R2_MI_02      NaN   2021.0     NaN   

   State_match  Year_match  Type_match  match  
0        False       False       False   True  
1         True        True       False  False  
2         True        True        True   True  
3        False       False        True  False  
4         True        True       False  False  
5        False        True       False  False  

CodePudding user response:

You're right that this line is causing the problem

df.loc[(df[f'{col}_db'].isnull().all(axis='columns')), 'match'] = "True"

The problem is that the variable col is what you are using to loop and on this particular line it is probably assigned to the value "Type" - so you are effectively selecting a pandas Series here and pandas is complaining that a Series object doesn't have a column attribute.

You need to change that line to -

df.loc[df[['State_db', 'Year_db', 'Type_db']].isnull().all(axis='columns'), 'match'] = True
  •  Tags:  
  • Related