Home > database >  Delete every row except latest date of the each year and then convert date to only a year
Delete every row except latest date of the each year and then convert date to only a year

Time:01-25

I am trying to leave only rows in dataframe which are the latest from each year included (from 2000 till 2018) and after that convert date from dd-mm-yyyy to only a year number.

So far I got only imported the data:

df_spx = web.DataReader('^GSPC', 'yahoo', start='2000', end='2018')
df_spx.reset_index(inplace=True)
df_spx['Date'] = pd.to_datetime(df_spx['Date'])
df_spx

And the output is this (as a image in url, can't post pictures yet):

enter image description here

CodePudding user response:

You might want to order the dataframe by 'Date' columne.

Then, keep only Year value to group.

Finally, you can get latest rows after grouping the dataframe, as follows:

import pandas as pd

# sample dataframe
df = pd.DataFrame({
    'Date': ['2000-01-03', '2000-01-04', '2018-12-19', '2018-12-18'],
    'High': [1, 2, 3, 4],
})

df['Date'] = pd.to_datetime(df['Date'])
df.sort_values(by=['Date'], inplace=True)  # order by Date
df['Date'] = df['Date'].dt.year  # already ordered by Date, then leave year only.

df = df.groupby('Date').tail(1)  # get latest row per year

print(df)
#   Date  High
#1  2000     2 # latest row per 2000 (2000-01-04)
#2  2018     3 # latest row per 2018 (2018-12-19)
  •  Tags:  
  • Related