I have a codebase with a lot of repeated logic - mainly around input validation. Eg.
def get_temperature(country:str, date:str|None=None)->float:
date = pd.to_datetime(date).date() if date is not None else datetime.now().date()
country = lookup_country_isocode(country)
...
def get_sunset_time(country:str, date:str|None=None)->float:
date = pd.to_datetime(date).date() if date is not None else datetime.now().date()
country = lookup_country_isocode(country)
...
def get_sunrise_time(country:str, date:str|None=None)->float:
date = pd.to_datetime(date).date() if date is not None else datetime.now().date()
country = lookup_country_isocode(country)
...
I'm thinking of refactoring the code by introducing a decorator:
def prepare_input(func):
def inner(*args, **kwargs):
kwargs['date'] = pd.to_datetime(date).date() if date is not None else datetime.now().date()
kwargs['country'] = lookup_country_isocode(country)
return func(*args, **kwargs)
return inner
This would let me write:
@prepare_input
def get_temperature(country:str, date:str|None=None)->float:
...
@prepare_input
def get_sunset_time(country:str, date:str|None=None)->float:
...
@prepare_input
def get_sunrise_time(country:str, date:str|None=None)->float:
...
However, I've not seen this done before, so keen to learn of any antipattern/code smell this might result in.
CodePudding user response:
You have a few options here:
Refactor code: put repeating logic in a function.
Consider writing a separate function
prepare_inputwith just the repeating logic. Then,- either use
pandas.DataFrame.Pipefunctionality, to pipe the functions. - or, use the function to call from within each of the three functions if you prefer that way.
- either use
Using decorators:
Decorators are very useful. However, if you have the logic in a simple function, it could be easier for maintainability. For easy implementation of decorators, you could also use the
decoratorslibrary: https://pypi.org/project/decorator/.
There could be an additional benefit in using a function over a decorator: you could potentially integrate frameworks like Kedro for reproducible data-science.
References
- A Better Way for Data Preprocessing: Pandas Pipe
- Kedro — A Python Framework for Reproducible Data Science Project
CodePudding user response:
this is going to get me downvoted since stack overflow generally does not like opinion based answers, but i would avoid prepare_input as that is not very explicit, and would likely confuse a user
I would recommend one of two methods if you would like to use a decorator... the first way is simply to be explicit, but even that is likely to confuse users
here is an example of being explicit
from dateutil.parser import parse as date_parse
import inspect
def args_to_kwargs(fn,args):
print("GET ARGS:",inspect.getfullargspec(fn))
data = dict(zip(inspect.getfullargspec(fn).args,args))
return data
def alter_args(**alters):
def __decorator(fn):
def __inner(*args,**kwargs):
# convert positional args to keyword args
kwargs2 = {**args_to_kwargs(fn,args),**kwargs}
for key in alters:
if key in kwargs2:
#apply alteration
kwargs2[key] = alters[key](kwargs2[key])
return fn(**kwargs2)
return __inner
return __decorator
iso_codes = {'us':'us_en','gb':'gb_en'}
@alter_args(date=date_parse,
code=lambda c:iso_codes.get(c,c))
def test(date:str,code:str):
print("GOT DATE:",date)
print("GOT code:",code)
test('2021-01-22 13:45','us')
I would instead recommend a type based system (editing to provide it in a minute)
def enforce_types(fn):
annotations = inspect.getfullargspec(fn).annotations
def __inner(*args,**kwargs):
# convert positional args to keyword args
kwargs2 = {**args_to_kwargs(fn, args), **kwargs}
for key in annotations:
if key in kwargs2:
print('AAA:',annotations,kwargs2,key)
# instanciate the Type
klass = annotations[key]
kwargs2[key] = klass(kwargs2[key])
return fn(**kwargs2)
print(annotations)
return __inner
class MyDateTime:
def __init__(self,datetimeString):
# some hackery since its hard to actually subclass datetime
self.dt = date_parse(datetimeString)
def __getattr__(self, item):
return getattr(self.dt,item)
def __str__(self):
return str(self.dt)
def __repr__(self):
return repr(self.dt)
class MyISOCode:
iso_codes = {'us': 'us_en', 'gb': 'gb_en'}
def __init__(self,code):
self.code = code
self.long = self.iso_codes.get(code,code)
def __getattr__(self, item):
return getattr(self.long, item)
def __str__(self):
return self.long
def __repr__(self):
return repr(self.dt)
@enforce_types
def test(date:MyDateTime, iso_code:MyISOCode):
print("Got Date:",repr(date))
print("Got ISO:",iso_code)
test('2021-02-02 13:05','us')
but really why not just convert the date once early and pass that around instead of a string that you have to constantly convert?
