How to sum the values in specific rows of Pandas DataFrame into one row?-CodePudding

I am working on a chat analyzer, and trying to make a pie chart for most active users. Suppose I have the following data.

Name	Number of messages
A	234
B	128
C	112
D	97
E	86
F	43
G	32
H	24
I	22
J	9

I want to convert it into the following

Name	Number of messages
A	234
B	128
C	112
D	97
E	86
Other	130

I am using pandas.value_counts() method for getting all the data for the first table, but am not able to use pandas.group_by() to get the second table.

Please can anyone help me out?

CodePudding user response：

Use df.where to convert names outside of a defined list to Other:

# If name is A, B, C, D or E, keep it. Otherwise, change it to Other
name = df["Name"].where(df["Name"].isin(["A", "B", "C", "D" ,"E"]), "Other")

df.groupby(name)["Number of messages"].sum()

CodePudding user response：

Try this:

# Find rows that have Name==["A", "B", "C", "D" ,"E"]
m = df["Name"].isin(["A", "B", "C", "D" ,"E"])

# Get rows that have Not Name == ["A", "B", "C", "D" ,"E"]
df_tmp = df[~m]

# Drop rows that have indexes like df_tmp from original 'df'
# Append row with Name=='Other' and sum 'Number of messages' from 'df_tmp'
df = df.drop(df_tmp.index
            ).append({'Name' : 'Other', 
                      'Number of messages': df_tmp['Number of messages'].sum()
                     }, ignore_index=True)
print(df)

Output:

   Name  Number of messages
0      A                 234
1      B                 128
2      C                 112
3      D                  97
4      E                  86
5  Other                 130

CodePudding user response：

Let us try rank with clip

s = df['Number of messages'].rank().clip(5)
out = df.groupby(s).agg({'Name':'first','Number of messages':'sum'}).sort_index(ascending=False).replace({'F':'other'})
Out[295]: 
                     Name  Number of messages
Number of messages                           
10.0                    A                 234
9.0                     B                 128
8.0                     C                 112
7.0                     D                  97
6.0                     E                  86
5.0                 other                 130