Home > Software engineering >  Return columns, unique value count for df.unique values > 1
Return columns, unique value count for df.unique values > 1

Time:01-20

I have a simple dataframe:

df = pd.DataFrame({'A': list('abbbcdee'),
             'B': list(range(0,8)),
             'C': list('aaaaaaaa')})


    A   B   C
0   a   0   a
1   b   1   a
2   b   2   a
3   b   3   a
4   c   4   a
5   d   5   a
6   e   6   a
7   e   7   a

I would like to be able to filter the results of df.nunique() to only return values greater than 1. df.nunique() returns:

df.nunique()
A    5
B    8
C    1
dtype: int64

I would like the following results:

A    5
B    8
dtype: int64

I expected this to work, but it doesn't:

df.loc[df.nunique() > 1]
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

CodePudding user response:

Filter your result with a lambda function:

>>> df.nunique()[lambda x: x > 1]
A    5
B    8
dtype: int64

CodePudding user response:

You need to slice your output with a self-reference, you could use an assignment expression (python ≥ 3.8):

For example:

s = (s:=df.nunique())[s.gt(1)]

or, more classically:

s = df.nunique()
s = s[s.gt(1)]

Output:

A    5
B    8
dtype: int64
  •  Tags:  
  • Related