Input dataframe df looks like:
item row
Apple 12
Apple 12
Apple 13
Orange 13
Orange 14
Lemon 14
Output dataframe need to be
item unique_row nunique_row count
Apple {12,13} 2 {2,1}
Orange {13,14} 2 {1,1}
Lemon {14} 1 {1}
Tried Code:
df.groupby('item', as_index=False)['row'].agg({'unique_row': lambda x: set(x)
,'nunique_row': lambda x: len(set(x))})
So here, not sure how to add condition inside .agg function to generate column 'count'. Column 'count' represents number of value_count for each row value.
Any help will be appreciated. Thank You!
CodePudding user response:
Solution
s = df.value_counts()
g = s.reset_index(name='count').groupby('item')
g.agg(list).join(g.size().rename('nunique_row'))
Working
- Calculate the groupsize per
itemandrowusingvalue_counts - group the preceding counts by
itemaggwithlistto get the list of unique rows and corresponding countsaggwithsizeto get number of unique rows
Result
row count nunique_row
item
Apple [12, 13] [2, 1] 2
Lemon [14] [1] 1
Orange [13, 14] [1, 1] 2
CodePudding user response:
You need to convert to list or set:
(df.groupby('item', as_index=False)['row']
.agg({'unique_row': lambda x: list(x.unique()),
'nunique_row': lambda x: len(set(x)),
'count': lambda x: list(x.value_counts(sort=False)), # or set(x.value_counts())
})
)
output:
item unique_row nunique_row count
0 Apple [12, 13] 2 [2, 1]
1 Lemon [14] 1 [1]
2 Orange [13, 14] 2 [1, 1]
