Suppose I have a data frame:
ID person_1 person_2
ID_001 Aaron Ben
ID_003 Kate Ben
ID_001 Aaron Lou
ID_005 Lee Ben
ID_006 Aaron Cassie
ID_001 Tim Ben
ID_003 Ben Mal
For every ID in the column "ID", I want to count the number of unique names that were associated with the ID
My desired output:
ID Count
ID_001 4
ID_003 3
ID_005 2
ID_006 2
CodePudding user response:
Flat your columns person1 and person2 then remove duplicated names and finally count unique value per ID:
out = df.melt('ID').drop_duplicates(['ID', 'value']) \
.value_counts('ID').rename('Count').reset_index()
print(out)
# Output
ID Count
0 ID_001 4
1 ID_003 3
2 ID_005 2
3 ID_006 2
CodePudding user response:
Use df.melt('ID').groupby('ID')['value'].nunique().
>>> df.melt('ID').groupby('ID')['value'].nunique()
ID
ID_001 4
ID_003 3
ID_005 2
ID_006 2
Name: value, dtype: int64
edit: df.set_index('ID').stack().groupby(level=0).nunique() works too.
CodePudding user response:
Melt your dataframe together, drop duplicates, then group by ID
and aggregate over the count of the variables. At least, rename the column variable to Count.
df.melt(["ID"]).drop_duplicates(["ID","value"]).groupby(
["ID"]).agg({"variable":"count"}).reset_index().rename(
columns={"variable":"Count"})
ID Count
ID_001 4
ID_003 3
ID_005 2
ID_006 2
