I'm trying to set the values in my indicator column 'match' equal to TRUE if the values in the 'State_db', 'Year_db', and 'Type_db' columns are all NULL. Another condition my code applies is checking if 'State_j'== 'State_db', 'Year_j'=='Year_db', and 'Type_j'=='Type_db', but this works just fine.
This is what my dataframe looks like:
State_j Year_j Type_j File State_db Year_db Type_db
NY 2020 ICF 2020_ICF_R2_NY_01 NaN NaN NaN
WA 2020 OP 2020_OP_R10_WA_01 WA 2020.0 IP
WA 2020 IP 2020_IP_R10_WA_01 WA 2020.0 IP
NJ 2021 ICF 2021_ICF_R2_NJ_01 NY 2022.0 ICF
CA 2021 PRTF 2021_PRTF_R2_CA_02 CA 2021.0 NaN
MI 2021 PRTF 2021_PRTF_R2_MI_02 NaN 2021.0 NaN
This is what I want the final dataframe to look like:
State Year Type File State_db Year_db Type_db match
NY 2020 ICF 2020_ICF_R2_NY_01 NaN NaN NaN True
WA 2020 OP 2020_OP_R10_WA_01 WA 2020.0 IP False
WA 2020 IP 2020_IP_R10_WA_01 WA 2020.0 IP True
NJ 2021 ICF 2021_ICF_R2_NJ_01 NY 2022.0 ICF False
CA 2021 PRTF 2021_PRTF_R2_CA_02 CA 2021.0 NaN False
MI 2021 PRTF 2021_PRTF_R2_MI_02 NaN 2021.0 NaN False
This is my code:
# Create columns to check for matching state, year, and service type cols
def assemble_report(df):
matches_cols = []
for col in ['State', 'Year', 'Type']:
df[f'{col}_match'] = (df[f'{col}_j'] == df[f'{col}_db'])
matches_cols.append(f'{col}_match')
# Create indicator column for whether all keys matched
df['match'] = df[matches_cols].all(axis='columns')
df.loc[(df[f'{col}_db'].isnull().all(axis='columns')), 'match'] = "True"
return matches_cols
Most of the code works as properly, but it's this bit that's giving me issues:
df.loc[(df[f'{col}_db'].isnull().all(axis='columns')), 'match'] = "True"
Basically, I want the indicator value 'match' to equal TRUE if the values in 'State_db', 'Year_db', and 'Type_db' are all NULL.
I'm fairly certain the piece of code above is triggering this error:
ValueError: No axis named columns for object type <class 'type'>
What I'm doing wrong and how can I fix this?
CodePudding user response:
def check_null(x):
return x['State_db'] == 'NaN' and x['Year_db'] == 'NaN' and x['Type_db'] == 'NaN'
def check_cond(x):
return x['State_j'] == x['State_db'] and float(x['Year_j']) == float(x['Year_db']) and x['Type_j'] == x['Type_db']
df['match'] = df.apply(lambda x: check_null(x) or check_cond(x), axis=1)
Output:
State_j Year_j Type_j File State_db Year_db Type_db match
1 NY 2020 ICF 2020_ICF_R2_NY_01 NaN NaN NaN True
2 WA 2020 OP 2020_OP_R10_WA_01 WA 2020.0 IP False
3 WA 2020 IP 2020_IP_R10_WA_01 WA 2020.0 IP True
4 NJ 2021 ICF 2021_ICF_R2_NJ_01 NY 2022.0 ICF False
5 CA 2021 PRTF 2021_PRTF_R2_CA_02 CA 2021.0 NaN False
6 MI 2021 PRTF 2021_PRTF_R2_MI_02 NaN 2021.0 NaN False
CodePudding user response:
First, make a new dataframe to all True, False entries by df2 = df.isnull().any().any(). If the original entry is null, it is True, else False.
Then define df['match'][i] = df2['State_db'][i] and df2['Year_db'][i] and df2['Type_db'][i], using logical and operator for true and false, for i in 0 to max row entries.
CodePudding user response:
Another method, you can use lambda pd.isnull for that
df['match']=df.apply(lambda x: pd.isnull(x['State_db']) and pd.isnull(x['Year_db']) and pd.isnull(x['Type_db']), axis=1)
