Why is my df converted to nonetype and inoperable when I call a class method on it?-CodePudding

I have a dataset, let's say:

emp_id	type	market_cap
1	a	7.845000e 10
2	b	6.235000e 10
3	c	NaN

I have the following class:

class DataCleaner:
    def __init__(self, dataf):
        """this is the constructor that initializes the dataframe to be cleaned"""
        self.dataf=dataf

    def remove_upper_quantile(self, col, quantile_num):
      
        self.dataf=self.dataf[self.dataf[col<self.dataf[col].quantile(quantile_num)]
        return self.dataf


    def remove_nulls(self, col):
        self.dataf=self.dataf.dropna(subset=[col], inplace=True)
        return self.dataf

When I call remove_nulls on my df, like so:

clean_company=DataCleaner(df)
df=clean_company.remove_nulls('market_cap')

I get the following: AttributeError: 'NoneType' object has no attribute 'dropna'.

This also happens when I don't assign df to the result.

What am I doing wrong here?

CodePudding user response：

The base must be in the dataframe.
To delete a column, use: df.pop('market_cap')

CodePudding user response：

You need to remove the inplace = True keyword argument within this method call:

def remove_nulls(self, col):
        self.dataf=self.dataf.dropna(subset=[col], inplace=True)
        return self.dataf

Looking at the documentation for the df.dropna method, you can see that when inplace=True the method will return None, rather than the dataframe.

You could, alternatively, just remove the self.dataf= component of that line and just have self.dataf.dropna(subset=[col], inplace=True) as that will drop the nas "inplace" and change the dataframe without you needing to overwrite it.