How to create a new dataframe with an average of panel data with different IDs and times?-CodePudding

I have a dataset, df that looks as follows:

Date	Code	City	State	Ranking
2020-01	10001	Los Angeles	CA	0.852
2020-02	10001	Los Angeles	CA	0.945
2020-03	10001	Los Angeles	CA	0.991
2020-01	20002	Houston	TX	0.134
2020-02	20002	Houston	TX	0.234
2020-03	20002	Houston	TX	0.667
...	...	...	...	...
2021-07	10001	Los Angeles	CA	0.678
2021-07	20002	Houston	TX	0.721

I have multiple cities, each city containing a Ranking that ranges from 2020-01 to 2021-07. I want to create a new dataframe, where I take the average of each city's ranking over time. Essentially, my new data set would be:

Code	Average Ranking
10001	0.8665
20002	0.439

I have no idea how to extract my information. The closest thing I thought about doing was still not giving me the right output:

df_avg = df.groupby(['Code','Date'],as_index=False)['Ranking'].mean().rename(columns={'Ranking':'Avg_Ranking'})

How can I create this new data frame, df_avg, with two columns, Code and Average Ranking, where Average Ranking is the mean Ranking for each Code?

CodePudding user response：

Just remove Date from your groupby key. In this case, you want the mean value of column Ranking from all rows in each Code column, so your groupby key should be only Code.

df_avg = df.groupby(['Code'],as_index=False)['Ranking'].mean().rename(columns={'Ranking':'Avg_Ranking'})