Creating a calculated field based on row values provided column values match-CodePudding

I have a pandas dataframe, US state temperature data that is grouped firstly by State and then by Year. I have already selected the first and last years of entries by subsetting the original dataframe. I want to create a new dataframe that shows the difference in AvgTemperature from the first year (1995) and the last year (2019) for all 50 states.

State	Year	AvgTemperature
Alabama	1995	63.66
Alabama	2019	66.32
Alaska	1995	35.97
...	...	...

I want to have a result that I can plot to show which states have changed the most over time, preferably in the format simply of State as column 1 and Temperature_Change as column 2.

CodePudding user response：

Try this:

df.sort_values(['State', 'Year']).groupby('State').apply(lambda g: g.iloc[-1]['AvgTemperature'] - g.iloc[0]['AvgTemperature'])

CodePudding user response：

You can pivot, compute the diff and plot as bar:

(df.pivot('State', 'Year', 'AvgTemperature')
   .diff(axis=1)
   .iloc[:,-1]
   .rename('diff')
   .plot.bar()
 )

NB. I used dummy data for Alaska in 2019.

Output:

enter image description here