I have a dataframe similar to this that is ~10,000 to ~100,000 rows:
data = [['2000-01-01', 10], ['2000-01-02', 15], ['2000-01-03', 14],
['2000-01-04', 13], ['2000-01-05', 17], ['2000-01-06', 16],
['2000-01-09', 19], ['2000-01-10', 20], ['2000-01-11', 18]]
df = pd.DataFrame(data, columns = ['Date', 'Value'])
Creating data as such:
| Date | Value |
|---|---|
| 2000-01-01 | 10 |
| 2000-01-02 | 15 |
| 2000-01-03 | 14 |
| 2000-01-04 | 13 |
| 2000-01-05 | 17 |
| 2000-01-06 | 16 |
| 2000-01-09 | 19 |
| 2000-01-10 | 20 |
| 2000-01-11 | 18 |
I want to compare each value to all the values ahead of it, and find the last instance where the value is equal to or lower than the current value. Output should look like this:
| Date | Value | Latest Date Equal or Below Value |
|---|---|---|
| 2000-01-01 | 10 | 2000-01-01 |
| 2000-01-02 | 15 | 2000-01-04 |
| 2000-01-03 | 14 | 2000-01-04 |
| 2000-01-04 | 13 | 2000-01-04 |
| 2000-01-05 | 17 | 2000-01-06 |
| 2000-01-06 | 16 | 2000-01-06 |
| 2000-01-09 | 19 | 2000-01-11 |
| 2000-01-10 | 20 | 2000-01-11 |
| 2000-01-11 | 18 | 2000-01-11 |
Any help appreciated.
CodePudding user response:
One way using pandas.Series.expanding with idxmin:
s = pd.Series(df["Value"].values,
index=pd.to_datetime(df["Date"]).view(int)).iloc[::-1]
s = s.expanding().apply(lambda x: (x - x.iloc[0]).idxmin())
df["Latest Date"] = pd.to_datetime(s).values[::-1]
Output:
Date Value Latest Date
0 2000-01-01 10 2000-01-01
1 2000-01-02 15 2000-01-04
2 2000-01-03 14 2000-01-04
3 2000-01-04 13 2000-01-04
4 2000-01-05 17 2000-01-06
5 2000-01-06 16 2000-01-06
6 2000-01-09 19 2000-01-11
7 2000-01-10 20 2000-01-11
8 2000-01-11 18 2000-01-11
Explain:
Comparing each element with its descendants is same as expanding in a reversed order. That's why I did s.iloc[::-1].
Furthermore, pandas.Series.expanding can only handle if and only if the result of apply is numeric; so the set the index with view(int) that will be used for idxmin.
CodePudding user response:
I am using these collections of functions, it works well:
from datetimerange import DateTimeRange
from datetime import datetime, timedelta, timezone
# make datetime from timestamp, thus no timezone info is attached
date = datetime.fromtimestamp(timestamp)
date = datetime.fromisoformat(datestring)
# make local timezone with time.timezone
local_tz = timezone(timedelta(seconds=-time.timezone))
# attach different timezones as you wish
datestring = "2022-01-28 00:00:00"
utc_time = datetime.fromisoformat(datestring).astimezone(timezone.utc)
local_time = datetime.fromtimestamp(datestring).astimezone(local_tz)
print(utc_time.isoformat(timespec='seconds'))
print(local_time.isoformat(timespec='seconds'))
# shift time by adding some minutes or hours
time = datetime.fromtimestamp(timestamp_value).astimezone(local_tz)
new_time = datetime.fromtimestamp(timestamp_value).astimezone(local_tz) timedelta(minutes=30)
Check if time is higher or lower
print(time > new_time) # it should be False
print(time < new_time) # it should be True
CodePudding user response:
def calculate(df, x):
date_val, val, index = x
all_values_difference = df.iloc[index:, 1]-val
min_index = all_values_difference.idxmin()
return df.iloc[min_index, 0]
data = [['2000-01-01', 10], ['2000-01-02', 15], ['2000-01-03', 14],
['2000-01-04', 13], ['2000-01-05', 17], ['2000-01-06', 16],
['2000-01-09', 19], ['2000-01-10', 20], ['2000-01-11', 18]]
df = pd.DataFrame(data, columns=['Date', 'Value'])
column_name = 'Latest Date Equal or Below Value'
df[column_name] = range(len(df))
df[column_name] = df.apply(lambda x: calculate(df, x), axis=1)
Output:
Date Value Latest Date Equal or Below Value
0 2000-01-01 10 2000-01-01
1 2000-01-02 15 2000-01-04
2 2000-01-03 14 2000-01-04
3 2000-01-04 13 2000-01-04
4 2000-01-05 17 2000-01-06
5 2000-01-06 16 2000-01-06
6 2000-01-09 19 2000-01-11
7 2000-01-10 20 2000-01-11
8 2000-01-11 18 2000-01-11
