I have a simple dataframe:
df = pd.DataFrame({'A': list('abbbcdee'),
'B': list(range(0,8)),
'C': list('aaaaaaaa')})
A B C
0 a 0 a
1 b 1 a
2 b 2 a
3 b 3 a
4 c 4 a
5 d 5 a
6 e 6 a
7 e 7 a
I would like to be able to filter the results of df.nunique() to only return values greater than 1.
df.nunique() returns:
df.nunique()
A 5
B 8
C 1
dtype: int64
I would like the following results:
A 5
B 8
dtype: int64
I expected this to work, but it doesn't:
df.loc[df.nunique() > 1]
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
CodePudding user response:
Filter your result with a lambda function:
>>> df.nunique()[lambda x: x > 1]
A 5
B 8
dtype: int64
CodePudding user response:
You need to slice your output with a self-reference, you could use an assignment expression (python ≥ 3.8):
For example:
s = (s:=df.nunique())[s.gt(1)]
or, more classically:
s = df.nunique()
s = s[s.gt(1)]
Output:
A 5
B 8
dtype: int64
