I am working on a chat analyzer, and trying to make a pie chart for most active users. Suppose I have the following data.
| Name | Number of messages |
|---|---|
| A | 234 |
| B | 128 |
| C | 112 |
| D | 97 |
| E | 86 |
| F | 43 |
| G | 32 |
| H | 24 |
| I | 22 |
| J | 9 |
I want to convert it into the following
| Name | Number of messages |
|---|---|
| A | 234 |
| B | 128 |
| C | 112 |
| D | 97 |
| E | 86 |
| Other | 130 |
I am using pandas.value_counts() method for getting all the data for the first table, but am not able to use pandas.group_by() to get the second table.
Please can anyone help me out?
CodePudding user response:
Use df.where to convert names outside of a defined list to Other:
# If name is A, B, C, D or E, keep it. Otherwise, change it to Other
name = df["Name"].where(df["Name"].isin(["A", "B", "C", "D" ,"E"]), "Other")
df.groupby(name)["Number of messages"].sum()
CodePudding user response:
Try this:
# Find rows that have Name==["A", "B", "C", "D" ,"E"]
m = df["Name"].isin(["A", "B", "C", "D" ,"E"])
# Get rows that have Not Name == ["A", "B", "C", "D" ,"E"]
df_tmp = df[~m]
# Drop rows that have indexes like df_tmp from original 'df'
# Append row with Name=='Other' and sum 'Number of messages' from 'df_tmp'
df = df.drop(df_tmp.index
).append({'Name' : 'Other',
'Number of messages': df_tmp['Number of messages'].sum()
}, ignore_index=True)
print(df)
Output:
Name Number of messages
0 A 234
1 B 128
2 C 112
3 D 97
4 E 86
5 Other 130
CodePudding user response:
Let us try rank with clip
s = df['Number of messages'].rank().clip(5)
out = df.groupby(s).agg({'Name':'first','Number of messages':'sum'}).sort_index(ascending=False).replace({'F':'other'})
Out[295]:
Name Number of messages
Number of messages
10.0 A 234
9.0 B 128
8.0 C 112
7.0 D 97
6.0 E 86
5.0 other 130
