Home > OS >  Find unique entries in a DataFrame
Find unique entries in a DataFrame

Time:01-09

I have a dataframe (a) with a unknown range objects (Names).

(a)
0  1    2    3    4     ...
1  Kay  Mary John Carl  ...
2  Mary Carl None None  ... 
3  Kay  Mary John Peter ...
4  Kay  John Carl None  ...

My goal is to get every unique object in the dataframe (a) and to create a new dataframe (b) with these objects as index.

(b)
0     1    2    3   4 
Kay   ...
Mary  ...
John  ...
Carl  ...
Peter ...
... 

CodePudding user response:

Use:

b = pd.DataFrame(index=a.stack().unique())
print(b)

# Output
Empty DataFrame
Columns: []
Index: [Kay, Mary, Carl, John, Peter]

Setup:

a = pd.DataFrame({'0': ['Kay', 'Mary', 'Kay', 'Kay'],
                  '1': ['Mary', 'Carl', 'Mary', 'John'],
                  '2': ['John', None, 'John', 'Carl'],
                  '3': ['Carl', None, 'Peter', None]})

Assume None is not the string 'None' but the NoneType. If this is not the case, use:

b = pd.DataFrame(index=a.replace('None', float('NaN')).stack().unique())
  •  Tags:  
  • Related