I have a dataset, df that looks as follows:
| Date | Code | City | State | Ranking |
|---|---|---|---|---|
| 2020-01 | 10001 | Los Angeles | CA | 0.852 |
| 2020-02 | 10001 | Los Angeles | CA | 0.945 |
| 2020-03 | 10001 | Los Angeles | CA | 0.991 |
| 2020-01 | 20002 | Houston | TX | 0.134 |
| 2020-02 | 20002 | Houston | TX | 0.234 |
| 2020-03 | 20002 | Houston | TX | 0.667 |
| ... | ... | ... | ... | ... |
| 2021-07 | 10001 | Los Angeles | CA | 0.678 |
| 2021-07 | 20002 | Houston | TX | 0.721 |
I have multiple cities, each city containing a Ranking that ranges from 2020-01 to 2021-07. I want to create a new dataframe, where I take the average of each city's ranking over time. Essentially, my new data set would be:
| Code | Average Ranking |
|---|---|
| 10001 | 0.8665 |
| 20002 | 0.439 |
I have no idea how to extract my information. The closest thing I thought about doing was still not giving me the right output:
df_avg = df.groupby(['Code','Date'],as_index=False)['Ranking'].mean().rename(columns={'Ranking':'Avg_Ranking'})
How can I create this new data frame, df_avg, with two columns, Code and Average Ranking, where Average Ranking is the mean Ranking for each Code?
CodePudding user response:
Just remove Date from your groupby key. In this case, you want the mean value of column Ranking from all rows in each Code column, so your groupby key should be only Code.
df_avg = df.groupby(['Code'],as_index=False)['Ranking'].mean().rename(columns={'Ranking':'Avg_Ranking'})
