I'm caching values that are slow to calculate but are usually needed several times. I have a dictionary that looks something like this:
stored_values = {
hash1: slow_to_calc_value1
hash2: slow_to_calc_value2
# And so on x5000
}
I'm using it like this, to quickly fetch the value if it has been calculated before.
def calculate_value_for_item(item):
item_hash = hash_item(item) # Hash the item, used as the dictionary key
stored_value = stored_values.get(item_hash, None)
if stored_value is not None:
return stored_value
calculated_value = do_heavy_math(item) # This is slow and I want to avoid
# Storing the reult for re-use makes me run out of memory at some point
stored_values[item_hash] = calculated_value
return calculated_value
However, I'm running out of memory if I try to store all values that are calculated throughout the program.
How can I manage the size of the lookup dictionary efficiently? It's a reasonable assumption that values which were needed most recently are also most likely to be needed in the future.
Things to note
- I have simplified the scenario a lot.
- The stored values actually use a lot of memory. The dictionary itself doesn't contain too many items, only several thousand. I can definitely afford some parallel book-keeping data structures if needed.
- An ideal solution would let me store
nlast needed values while removing the rest. But any heuristic close enough is good enough.
CodePudding user response:
Have you tried using the @lru_cache decorator? It seems to do exactly what you are asking for.
from functools import lru_cache
store_this_many_values = 5
@lru_cache(maxsize=store_this_many_values)
def calculate_value_for_item(item):
calculated_value = do_heavy_math(item)
return calculated_value
@lru_cache also adds new functions, which might help you to optimise for memory and/or performance, such as cache_info
for i in [1,1,1,2]:
calculate_value_for_item(i)
print(calculate_value_for_item.cache_info())
>>> CacheInfo(hits=2, misses=2, maxsize=5, currsize=2)
