I have a dataframe:
df1 = pd.DataFrame({'id': ['1','2','2','3','3','4','4'],
'name': ['James','Jim','jimy','Daniel','Dane','Ash','Ash'],
'event': ['Basket','Soccer','Soccer','Basket','Soccer','Basket','Soccer']})
I want to count unique values of id but with the name, the result I except are:
id name count
1 James 1
2 Jim, jimy 2
3 Daniel, Dane 2
4 Ash 2
I try to group by id and name but it doesn't count as i expected
CodePudding user response:
You could try:
df1.groupby('id').agg(
name=('name', lambda x: ', '.join(x.unique())),
count=('name', 'count')
)
We are basically grouping by id and then joining the unique names to a comma separated list!
CodePudding user response:
Here is a solution:
groups = df1[["id", "name"]].groupby("id")
a = groups.agg(lambda x: ", ".join( set(x) ))
b = groups.size().rename("count")
c = pd.concat([a,b], axis=1)
I'm not an expert when it comes to pandas but I thought I might as well post my solution because I think that it's straightforward and readable.
In your example, the groupby is done on the id column and not by id and name. The name column you see in your expected DataFrame is the result of an aggregation done after a groupby.
Here, it is obvious that the groupby was done on the id column.
My solution is maybe not the most straightforward but I still find it to be more readable:
- Create a groupby object
groupsby grouping byid - Create a DataFrame
afromgroupsby aggregating it using commas (you also need to remove the duplicates usingset(...)):lambda x: ", ".join( set(x) )
The DataFrame a will thus have the following data:
name
id
1 James
2 Jim, jimy
3 Daniel, Dane
4 Ash
- Create another DataFrame
bby computing the size of each groups ingroups:groups.size()(you should also rename your column)
id
1 1
2 2
3 2
4 2
Name: count, dtype: int64
- Concat
aandbhorizontally and you get what you wanted
name count
id
1 James 1
2 Jim, jimy 2
3 Daniel, Dane 2
4 Ash 2
