First of all I am importing a csv file data to pandas
Now I have the following data
| Schooltyp | Name | ParentName | ParentAddress |
|---|---|---|---|
| Public | Tom | John | Nanostreet |
| Private | Bill | Sally | NaN |
| Public | Ron | Tony | Burystreet |
| Public | Danny | Nate | NaN |
| Private | Stewart | Ben | PringleStreet |
I need to remove data
where Schooltyp = Public and ParentAddress is null.
I tried different solutions. This is the latest solution I have currently used which results in an error due to the particular condition I chose (data['ParentAddress'].isnull)
data = pd.read_csv("dataload.csv", sep = ';', error_bad_lines=False )
indexnames = data[(data['Schooltype']=='Public') & (data['ParentAddress'].isnull)].indexes
data.drop(indexnames, inplace = True)
data.to_csv('finaldata.csv', sep=';', index=False)
Am I using the right approach, is there a better way in doing this?
CodePudding user response:
To remove all rows where Schooltyp is "Public" and ParentAddress is null:
should_be_removed = (df['Schooltyp'] == 'Public') & df['ParentAddress'].isna()
df.loc[~ should_be_removed]
Result:
Schooltyp Name ParentName ParentAddress
0 Public Tom John Nanostreet
1 Private Bill Sally NaN
2 Public Ron Tony Burystreet
4 Private Stewart Ben PringleStreet
Notes:
.ne()is equivalent to!=, just less typing.There is also a method
.eq()which is the same as==.To invert a condition, you can put
~before it.
