Get most frequent elements across all groups in a Panda time series-CodePudding

How one can plot the count of the n most frequent elements across all groups for a given multi group time series? Note this is different from n most frequent elements of each group, which I could easily accomplish with count and nlargest.

Let's dive into an example. Given a dataframe:

import pandas as pd

data = {'year': [2020, 2020, 2021, 2021, 2022], 
        'month': [1, 1, 2, 2, 3],
        'Name': ['name_1', 'name_2', 'name_1', 'name_2', 'name_1'], 
        'count': [10, 12, 8, 10, 2]}  

df = pd.DataFrame(data)

print(df)

which outputs

   year  month    Name  Count
0  2020      1  name_1     10
1  2020      1  name_2     12
2  2021      2  name_1      8
3  2021      2  name_2     10
4  2022      3  name_1      2

data should be grouped by year and month
I want n = 1, in other words the most frequent one

I would like to plot only name_1's count since, although it does not have the largest count in any group (or even overall), it "appears" more times across all groups.

CodePudding user response：

IIUC, you want to filter the most common Name and plot the counts?

# get top Name
top = df['Name'].value_counts().index[0]

# filter
df2 = df[df['Name'].eq(top)]

# plot
(df2.assign(date=df2[['year', 'month']].astype(str).apply('_'.join, axis=1))
    .plot.bar(x='date', y='count')
)

several TOP values

# get top Name
top = df['Name'].value_counts().index[:2]

# filter and reshape
df2 = (df[df['Name'].isin(top)]
        .pivot(index=['year', 'month'],
               columns='Name',
               values='count')
      )

# plot
df2.plot.bar()