Home > database >  Calculate max on a sliding window for TimeSeries
Calculate max on a sliding window for TimeSeries

Time:01-05

Input:

    public class MyObject
    {
        public double Value { get; set; }
        public DateTime Date { get; set; }
    }

Method to generate test objects:

public static MyObject[] GetTestObjects()
{
    var rnd = new Random();
    var date = new DateTime(2021, 1, 1, 0, 0, 0);
    var result = new List<MyObject>();
    for (int i = 0; i < 50000; i  )
    {
        //this is to simulate real data having gaps
        if (rnd.Next(100) < 25)
        {
            continue;
        }
        var myObject = new MyObject()
        {
            Value = rnd.NextDouble(),
            Date = date.AddMinutes(15 * i)
        };
        result.Add(myObject);
    }
    return result.ToArray();
}

Given this I require to calculate maximum Value for previous 12 month for each myObject. I could just think of doing this InParallel, but maybe there is an optimized solution?

Sorry for being unclear, this is what I use right now to get what I want:

        public MyObject[] BruteForceBackward(MyObject[] testData)
        {
            return testData.AsParallel().Select(point =>
            {
                var max = testData.Where(x => x.Date <= point.Date && x.Date >= point.Date.AddYears(-1)).Max(x => x.Value);
                return new MyObject() { Date = point.Date, Value = point.Value / max };
            }).OrderBy(r => r.Date).ToArray();
        }

This works but it is slow and eats processor resources (imagine, you have 100k objects), I believe there must be something better

CodePudding user response:

Assuming you meant you need the maximum Value for each of the last 12 months from result, then you can use LINQ:

var beginDateTime = DateTime.Now.AddMonths(-12);
var ans = result.Where(r => r.Date >= beginDateTime).GroupBy(r => r.Date.Month).Select(mg => mg.MaxBy(r => r.Value)).ToList();

Running some timing, I get that putting AsParallel after result changes the run time from around 16ms (first run) to around 32ms, so it is actually slower. It is about the same after the Where and about 23ms after the GroupBy (processing the 12 groups in parallel). On my PC at least, there isn't enough data or complex operations for parallelism, but the GroupBy isn't the most efficient.

Using an array and testing each element, I get the results in about 1.2ms:

var maxMOs = new MyObject[12];
foreach (var r in result.Where(r => r.Date >= beginDateTime)) {
    var monthIndex = r.Date.Month-1;
    if (maxMOs[monthIndex] == null || r.Value > maxMOs[monthIndex].Value)
        maxMOs[monthIndex] = r;
}

Note that the results are not chronological; you could offset monthIndex by today's month to order the results if desired.

var maxMOs = new MyObject[12];
var offset = DateTime.Now.Month-11;
foreach (var r in result.Where(r => r.Date >= beginDateTime)) {
    var monthIndex = r.Date.Month-offset;
    if (maxMOs[monthIndex] == null || r.Value > maxMOs[monthIndex].Value)
        maxMOs[monthIndex] = r;
}

A micro-optimization (mostly useful on repeat runnings) is to invert the test and use the null-propagating operator:

if (!(r.Value <= maxMOs[monthIndex]?.Value))

This saves about 0.2ms on the first run but up to 0.5ms on subsequent runs.

CodePudding user response:

I had a simillar project where i had to calculate such stuff on tons of sensor data.

This is all pseudo code, what counts is the logic. If you need something more specific, write a comment and I can adapt/correct it tomorrow. It was just written out of my head without any error check or something so you probably can't just copy paste it.

In general you want to reduce the amount of loops going over all your data. At best, you want to touch each element only one single time. The following code is ment to be fed one element at a time, from oldest to newest.
The worst case scenario are long periods of decreasing values. For further optimisation, one might look to reduce the list shuffles when removing elements. It can hence be used on historical and real time data.

Add a second tracking list and a maximum var

public double CurrentMaximum;
private List<MyObject> MaximumValues = new List<MyObject>();
public AddValue(MyObject objectToAdd)
{
    // calculate the oldest date to keep in tracking list
    DateTime targetDate = objectToAdd.Date.AddYears(-1);
    // maximum logic
    if (objectToAdd.Value >= CurrentMaximum)
    {
        // new maximum found, clear tracking list
        // this is the best case scenario
        MaximumValues.Clear();
        MaximumValues.Add(objectToAdd);
        CurrentMaximum = objectToAdd.Value;
    }
    else
    {
        // unfortunately, no new maximum was found
        // we still need to add it to our tracking list,
        // all values in the future might be smaller
        MaximumValues.Add(objectToAdd);
        // go backwards the maximum tracking list and check for smaller values
        // clear the list of all smaller values. The list should therefore always
        // be in descending order
        for(int i = MaximumValues.Count-1; i >= 0; i--)
        {
            if (MaximumValues[i].Value <= ObjectToAdd.Value)
            {
                // a lower value has been found. We have a newer, higher value
                // clear this waste value from the tracking list
                MaximumValues.RemoveAt[i];
            }
            else
            {
                // there are no more lower values. 
                // stop looking for smaller values to save time
                break;
            }
        }
    }
    // check if the oldest value is too old to be kept in the tracking list
    if (MaximumValues[0].Date < targetDate)
    {
        // oldest value is to be removed
        MaximumValues.RemoveAt[0];
        // update maximum
        CurrentMaximum = MaximumValues[0].Value;
    }
    // add object to result list
    result.Add(objectToAdd);
}
  •  Tags:  
  • Related