I have a dataset, let's say:
| emp_id | type | market_cap |
|---|---|---|
| 1 | a | 7.845000e 10 |
| 2 | b | 6.235000e 10 |
| 3 | c | NaN |
I have the following class:
class DataCleaner:
def __init__(self, dataf):
"""this is the constructor that initializes the dataframe to be cleaned"""
self.dataf=dataf
def remove_upper_quantile(self, col, quantile_num):
self.dataf=self.dataf[self.dataf[col<self.dataf[col].quantile(quantile_num)]
return self.dataf
def remove_nulls(self, col):
self.dataf=self.dataf.dropna(subset=[col], inplace=True)
return self.dataf
When I call remove_nulls on my df, like so:
clean_company=DataCleaner(df)
df=clean_company.remove_nulls('market_cap')
I get the following: AttributeError: 'NoneType' object has no attribute 'dropna'.
This also happens when I don't assign df to the result.
What am I doing wrong here?
CodePudding user response:
- The base must be in the dataframe.
- To delete a column, use: df.pop('market_cap')
CodePudding user response:
You need to remove the inplace = True keyword argument within this method call:
def remove_nulls(self, col):
self.dataf=self.dataf.dropna(subset=[col], inplace=True)
return self.dataf
Looking at the documentation for the df.dropna method, you can see that when inplace=True the method will return None, rather than the dataframe.
You could, alternatively, just remove the self.dataf= component of that line and just have self.dataf.dropna(subset=[col], inplace=True) as that will drop the nas "inplace" and change the dataframe without you needing to overwrite it.
