Home > database >  Saving calculation results for re-use, while managing memory consumption
Saving calculation results for re-use, while managing memory consumption

Time:01-21

I'm caching values that are slow to calculate but are usually needed several times. I have a dictionary that looks something like this:

stored_values = {
    hash1: slow_to_calc_value1
    hash2: slow_to_calc_value2
    # And so on x5000
}

I'm using it like this, to quickly fetch the value if it has been calculated before.

def calculate_value_for_item(item):
    item_hash = hash_item(item) # Hash the item, used as the dictionary key
    stored_value = stored_values.get(item_hash, None)
    if stored_value is not None:
        return stored_value
    calculated_value = do_heavy_math(item) # This is slow and I want to avoid
    # Storing the reult for re-use makes me run out of memory at some point
    stored_values[item_hash] = calculated_value
    return calculated_value

However, I'm running out of memory if I try to store all values that are calculated throughout the program.

How can I manage the size of the lookup dictionary efficiently? It's a reasonable assumption that values which were needed most recently are also most likely to be needed in the future.

Things to note

  • I have simplified the scenario a lot.
  • The stored values actually use a lot of memory. The dictionary itself doesn't contain too many items, only several thousand. I can definitely afford some parallel book-keeping data structures if needed.
  • An ideal solution would let me store n last needed values while removing the rest. But any heuristic close enough is good enough.

CodePudding user response:

Have you tried using the @lru_cache decorator? It seems to do exactly what you are asking for.

from functools import lru_cache

store_this_many_values = 5

@lru_cache(maxsize=store_this_many_values)
def calculate_value_for_item(item):
    calculated_value = do_heavy_math(item)
    return calculated_value

@lru_cache also adds new functions, which might help you to optimise for memory and/or performance, such as cache_info

for i in [1,1,1,2]:
    calculate_value_for_item(i)
print(calculate_value_for_item.cache_info())

>>> CacheInfo(hits=2, misses=2, maxsize=5, currsize=2)
  •  Tags:  
  • Related