I have data in this format
['96.22872386(2015-03-28)', 'None(2014-03-31)', '96.22872386(2015-03-30)', 'None(2017-03-31)']
['None(2015-10-01)', '1400(2021-11-23)', '1281(2019-07-26)', '690.854(2016-12-13)', '530(2021-08-21)']
Expected result:
['None(2014-03-31)', '96.22872386(2015-03-28)', '96.22872386(2015-03-30)', 'None(2017-03-31)']
['None(2015-10-01)', '690.854(2016-12-13)', '1281(2019-07-26)', '530(2021-08-21)', '1400(2021-11-23)']
Basically I want to sort the column values with respect to date.
CodePudding user response:
import pandas as pd
df = pd.DataFrame(['96.22872386(2015-03-28)', 'None(2014-03-31)', '96.22872386(2015-03-30)', 'None(2017-03-31)'], columns=['col'])
def sort_by_date(df):
df['date'] = df['col'].str.extract("(\(.*\))", expand=True)
df = df.sort_values(by='date').drop(columns=['date'])
return df
sorted_df = sort_by_date(df)
CodePudding user response:
If data are per rows use sorted with converting values to datetimes:
print (df)
col
0 ['96.22872386(2015-03-28)', 'None(2014-03-31)'...
1 ['None(2015-10-01)', '1400(2021-11-23)', '1281...
#if need convert values to lists
#import ast
#df['col'] = df['col'].apply(ast.literal_eval)
f = lambda x: sorted(x, key=lambda y: pd.to_datetime(y.split('(')[-1].strip(')')))
df['col'] = df['col'].apply(f)
print (df)
col
0 [None(2014-03-31), 96.22872386(2015-03-28), 96...
1 [None(2015-10-01), 690.854(2016-12-13), 1281(2...
If data are per column like scalars, not liske lists:
Use parameter key in extracted values between () and converting to datetimes:
df = df.sort_values('col', key = lambda x: pd.to_datetime(x.str.extract(r"\((.*?)\)", expand=False)))
For oldier pandas versions use alternative solution:
df = df.iloc[pd.to_datetime(df['col'].str.extract(r"\((.*?)\)", expand=False)).argsort()]
