Python 3.9 and Pandas 1.3.4
So here's the df:
1 First Name Last Name fullname
2 Freddie Mercury Freddie Mercury
3 John Lennon John Lennon
4 David Bowie David Bowie
5 John Doe
6 Joseph Joseph
7 Jovi Jovi
My piece of code currently just finds the fullname column is just First Name Last Name.
I'm currently trying to filter for blank entries in the First Name column, Last Name column, and any "John Does" in the fullname column.
Current code:
import pandas as pd
df = pd.read_csv('file.csv', dtype=str, header=0)
df2 = pd.DataFrame(df, columns=['First Name', 'Last Name', 'fullname'])
df['fullname'] = (df[['First Name', 'Last Name']].fillna('').agg(' '.join, axis=1).str.strip().replace('', 'John Doe'))
df_sort = df2.loc[df2['First Name'] == " "] | df2.loc[df2['Last Name'] == " "] | df2.loc[df2['fullname'] == "John Doe"]
df.to_csv('file.csv', index=False)
df_sort.to_csv('missing names.csv', index=False)
Currently I am having the missing names write to a new file and outputs only this:
First Name Last Name fullname
Everything is empty under.
I would like for the output to be:
First Name Last Name fullname
John Doe
Joseph
Jovi
CodePudding user response:
Replace possible missing values to empty string, compare and test if at least one value match in DataFrame.any:
df_sort = df2[df2[['First Name', 'Last Name']].fillna('').eq('').any(axis=1)]
Or if there are missing values use:
df_sort = df2[df2[['First Name', 'Last Name']].isna().any(axis=1)]
CodePudding user response:
You don't need to use .loc at each condition but you need to add some ( ):
>>> df2[(df2['First Name'] == " ")
| (df2['Last Name'] == " ")
| (df2['fullname'] == "John Doe")]
First Name Last Name fullname
3 NaN NaN John Doe
4 Joseph NaN Joseph
5 NaN Jovi Jovi
Note: adapt your missing values to your code. Here I use == " " (blank space) like you but you can use == "" (empty string) or use .isna() (missing values).
