Comparing current value to all future values in pandas-CodePudding

I have a dataframe similar to this that is ~10,000 to ~100,000 rows:

data = [['2000-01-01', 10], ['2000-01-02', 15], ['2000-01-03', 14], 
    ['2000-01-04', 13], ['2000-01-05', 17], ['2000-01-06', 16],
    ['2000-01-09', 19], ['2000-01-10', 20], ['2000-01-11', 18]]


df = pd.DataFrame(data, columns = ['Date', 'Value'])

Creating data as such:

Date	Value
2000-01-01	10
2000-01-02	15
2000-01-03	14
2000-01-04	13
2000-01-05	17
2000-01-06	16
2000-01-09	19
2000-01-10	20
2000-01-11	18

I want to compare each value to all the values ahead of it, and find the last instance where the value is equal to or lower than the current value. Output should look like this:

Date	Value	Latest Date Equal or Below Value
2000-01-01	10	2000-01-01
2000-01-02	15	2000-01-04
2000-01-03	14	2000-01-04
2000-01-04	13	2000-01-04
2000-01-05	17	2000-01-06
2000-01-06	16	2000-01-06
2000-01-09	19	2000-01-11
2000-01-10	20	2000-01-11
2000-01-11	18	2000-01-11

Any help appreciated.

CodePudding user response：

One way using pandas.Series.expanding with idxmin:

s = pd.Series(df["Value"].values, 
              index=pd.to_datetime(df["Date"]).view(int)).iloc[::-1]
s = s.expanding().apply(lambda x: (x - x.iloc[0]).idxmin())
df["Latest Date"] = pd.to_datetime(s).values[::-1]

Output:

         Date  Value Latest Date
0  2000-01-01     10  2000-01-01
1  2000-01-02     15  2000-01-04
2  2000-01-03     14  2000-01-04
3  2000-01-04     13  2000-01-04
4  2000-01-05     17  2000-01-06
5  2000-01-06     16  2000-01-06
6  2000-01-09     19  2000-01-11
7  2000-01-10     20  2000-01-11
8  2000-01-11     18  2000-01-11

Explain:

Comparing each element with its descendants is same as expanding in a reversed order. That's why I did s.iloc[::-1].

Furthermore, pandas.Series.expanding can only handle if and only if the result of apply is numeric; so the set the index with view(int) that will be used for idxmin.

CodePudding user response：

I am using these collections of functions, it works well:

from datetimerange import DateTimeRange
from datetime import datetime, timedelta, timezone

# make datetime from timestamp, thus no timezone info is attached
date = datetime.fromtimestamp(timestamp)
date = datetime.fromisoformat(datestring)

# make local timezone with time.timezone
local_tz = timezone(timedelta(seconds=-time.timezone))

# attach different timezones as you wish
datestring = "2022-01-28 00:00:00"
utc_time = datetime.fromisoformat(datestring).astimezone(timezone.utc)
local_time = datetime.fromtimestamp(datestring).astimezone(local_tz)
print(utc_time.isoformat(timespec='seconds')) 
print(local_time.isoformat(timespec='seconds'))

# shift time by adding some minutes or hours
time = datetime.fromtimestamp(timestamp_value).astimezone(local_tz)
new_time = datetime.fromtimestamp(timestamp_value).astimezone(local_tz)   timedelta(minutes=30)

Check if time is higher or lower

print(time > new_time)   # it should be False
print(time < new_time)   # it should be True

CodePudding user response：

def calculate(df, x):
    date_val, val, index = x
    all_values_difference = df.iloc[index:, 1]-val
    min_index = all_values_difference.idxmin()
    return df.iloc[min_index, 0]


data = [['2000-01-01', 10], ['2000-01-02', 15], ['2000-01-03', 14],
        ['2000-01-04', 13], ['2000-01-05', 17], ['2000-01-06', 16],
        ['2000-01-09', 19], ['2000-01-10', 20], ['2000-01-11', 18]]


df = pd.DataFrame(data, columns=['Date', 'Value'])
column_name = 'Latest Date Equal or Below Value'
df[column_name] = range(len(df))
df[column_name] = df.apply(lambda x: calculate(df, x), axis=1)

Output:
    Date    Value   Latest Date Equal or Below Value
0   2000-01-01  10  2000-01-01
1   2000-01-02  15  2000-01-04
2   2000-01-03  14  2000-01-04
3   2000-01-04  13  2000-01-04
4   2000-01-05  17  2000-01-06
5   2000-01-06  16  2000-01-06
6   2000-01-09  19  2000-01-11
7   2000-01-10  20  2000-01-11
8   2000-01-11  18  2000-01-11