I have a pandas dataframe, US state temperature data that is grouped firstly by State and then by Year. I have already selected the first and last years of entries by subsetting the original dataframe. I want to create a new dataframe that shows the difference in AvgTemperature from the first year (1995) and the last year (2019) for all 50 states.
| State | Year | AvgTemperature |
|---|---|---|
| Alabama | 1995 | 63.66 |
| Alabama | 2019 | 66.32 |
| Alaska | 1995 | 35.97 |
| ... | ... | ... |
I want to have a result that I can plot to show which states have changed the most over time, preferably in the format simply of State as column 1 and Temperature_Change as column 2.
CodePudding user response:
Try this:
df.sort_values(['State', 'Year']).groupby('State').apply(lambda g: g.iloc[-1]['AvgTemperature'] - g.iloc[0]['AvgTemperature'])
CodePudding user response:
You can pivot, compute the diff and plot as bar:
(df.pivot('State', 'Year', 'AvgTemperature')
.diff(axis=1)
.iloc[:,-1]
.rename('diff')
.plot.bar()
)
NB. I used dummy data for Alaska in 2019.
Output:

