For each value in a column, how to count the number of unique values in its row?-CodePudding

Suppose I have a data frame:

ID       person_1     person_2
ID_001   Aaron        Ben
ID_003   Kate         Ben
ID_001   Aaron        Lou
ID_005   Lee          Ben
ID_006   Aaron        Cassie
ID_001   Tim          Ben
ID_003   Ben          Mal

For every ID in the column "ID", I want to count the number of unique names that were associated with the ID

My desired output:

ID       Count
ID_001   4
ID_003   3
ID_005   2
ID_006   2

CodePudding user response：

Flat your columns person1 and person2 then remove duplicated names and finally count unique value per ID:

out = df.melt('ID').drop_duplicates(['ID', 'value']) \
        .value_counts('ID').rename('Count').reset_index()
print(out)

# Output
       ID  Count
0  ID_001      4
1  ID_003      3
2  ID_005      2
3  ID_006      2

CodePudding user response：

Use df.melt('ID').groupby('ID')['value'].nunique().

>>> df.melt('ID').groupby('ID')['value'].nunique()
ID
ID_001    4
ID_003    3
ID_005    2
ID_006    2
Name: value, dtype: int64

edit: df.set_index('ID').stack().groupby(level=0).nunique() works too.

CodePudding user response：

Melt your dataframe together, drop duplicates, then group by ID and aggregate over the count of the variables. At least, rename the column variable to Count.

df.melt(["ID"]).drop_duplicates(["ID","value"]).groupby(
    ["ID"]).agg({"variable":"count"}).reset_index().rename(
    columns={"variable":"Count"})

   ID          Count
   ID_001      4
   ID_003      3
   ID_005      2
   ID_006      2