Home > database >  Creating a calculated field based on row values provided column values match
Creating a calculated field based on row values provided column values match

Time:02-02

I have a pandas dataframe, US state temperature data that is grouped firstly by State and then by Year. I have already selected the first and last years of entries by subsetting the original dataframe. I want to create a new dataframe that shows the difference in AvgTemperature from the first year (1995) and the last year (2019) for all 50 states.

State Year AvgTemperature
Alabama 1995 63.66
Alabama 2019 66.32
Alaska 1995 35.97
... ... ...

I want to have a result that I can plot to show which states have changed the most over time, preferably in the format simply of State as column 1 and Temperature_Change as column 2.

CodePudding user response:

Try this:

df.sort_values(['State', 'Year']).groupby('State').apply(lambda g: g.iloc[-1]['AvgTemperature'] - g.iloc[0]['AvgTemperature'])

CodePudding user response:

You can pivot, compute the diff and plot as bar:

(df.pivot('State', 'Year', 'AvgTemperature')
   .diff(axis=1)
   .iloc[:,-1]
   .rename('diff')
   .plot.bar()
 )

NB. I used dummy data for Alaska in 2019.

Output:

enter image description here

  •  Tags:  
  • Related