Function I have created:
#Create a function that identifies blank values
def GPID_blank(df, variable):
df = df.loc[df['GPID'] == variable]
return df
Test:
variable = ''
test = GPID_blank(df, variable)
test
Goal: Create a function that can filter any dataframe column 'GPID' to see all of the rows where GPID has missing data.
I have tried running variable = 'NaN' and still no luck. However, I know the function works, as if I use a real-life variable "OH82CD85" the function filters my dataset accordingly.
Therefore, why doesn't it filter out the blank cells variable = 'NaN'? I know for my dataset, there are 5 rows with GPID missing data.
Example df:
df = pd.DataFrame({'Client': ['A','B','C'], 'GPID':['BRUNS2','OH82CD85','']})
Client GPID
0 A BRUNS2
1 B OH82CD85
2 C
Sample of GPID column:
0 OH82CD85
1 BW07TI20
2 OW36HW81
3 PE56TA73
4 CT46SX81
5 OD79AU80
6 GF46DB60
7 OL07ST01
8 VP38SM57
9 AH90AE61
10 PG86KO78
11 NaN
12 NaN
13 SO21GR72
14 DY85IN90
15 KW80CV02
16 CM15QP83
17 VC38FP82
18 DA36RX05
19 DD74HD38
CodePudding user response:
You can't use == with NaN. NaN != NaN.
Instead, you can modify your function a little to check if the parameter is NaN using pd.isna() (or np.isnan()):
def GPID_blank(df, variable):
if pd.isna(variable):
return df.loc[df['GPID'].isna()]
else:
return df.loc[df['GPID'] == variable]
CodePudding user response:
It's not working because with variable = 'NaN' you're looking for a string which content is 'NaN', not for missing values.
You can try:
import pandas as pd
def GPID_blank(df):
# filtered dataframe with NaN values in GPID column
blanks = df[df['GPID'].isnull()].copy()
return blanks
filtered_df = GPID_blank(df)
CodePudding user response:
You can't really search for NaN values like an expression. Also, in your example dataframe, '' is not NaN, but is str, and can be searched like an expression.
Instead, you need to check when you want to filter for NaN, and filter differently:
def GPID_blank(df, variable):
if pd.isna(variable):
df = df.loc[df['GPID'].isna()]
else:
df = df.loc[df['GPID'] == variable]
return df
