Home > database >  Finding the date before/after a date stored in int representation (Hive SQL)
Finding the date before/after a date stored in int representation (Hive SQL)

Time:01-25

I need to select between a range of dates centered on a specific date, but my partition column of dates is stored as int.

For example, selecting between the dates before and after '20210901' (20210831 to 20210902)

Is there any way to do that which is less convoluted than what I came up with below?

between cast(date_format(date_sub(date_format(from_unixtime(unix_timestamp(cast('20210901' as string),'yyyyMMdd')),'yyyy-MM-dd'),1),'yyyyMMdd') as int) and \
cast(date_format(date_add(date_format(from_unixtime(unix_timestamp(cast('20210901' as string),'yyyyMMdd')),'yyyy-MM-dd'),1),'yyyyMMdd');

CodePudding user response:

unix_timestamp from_unixtime conversion needed only when you have some date format which can not be transformed to yyyy-MM-dd format using string functions only for example 1 Jan 21 . SimpleDateFormat class used under thehood of unix_timestamp and from_unixtime is rather heavy and not simple at all, even regexp_replace is easier transformation.

Use string manipulation if possible:

select part_col between int(replace(date_sub(regexp_replace('20210901','^(\\d{4})(\\d{2})(\\d{2})$','$1-$2-$3'),1),'-',''))
                    and int(replace(date_add(regexp_replace('20210901','^(\\d{4})(\\d{2})(\\d{2})$','$1-$2-$3'),1),'-',''))

Even Int() conversion is not necessary, Hive will convert implicitly:

select part_col between replace(date_sub(regexp_replace('20210901','^(\\d{4})(\\d{2})(\\d{2})$','$1-$2-$3'),1),'-','')
                    and replace(date_add(regexp_replace('20210901','^(\\d{4})(\\d{2})(\\d{2})$','$1-$2-$3'),1),'-','')

If you can provide argument in DATE format yyyy-MM-dd, it would be simpler:

select part_col between replace(date_sub('2021-09-01',1),'-','')
                    and replace(date_add('2021-09-01',1),'-','')

CodePudding user response:

You seem to have an awful lot of steps in there. I think this should get you there. Still not very pretty, but it sure seems clearer to me.

WHERE
from_Unixtime(unix_timestamp(cast (<integer date column> as string),'yyyyMMdd'), 'yyyy-MM-dd') BETWEEN
date_add( from_Unixtime(unix_timestamp(cast (20210901 as string),'yyyyMMdd'), 'yyyy-MM-dd') ,-1) AND
date_add( from_Unixtime(unix_timestamp(cast (20210901 as string),'yyyyMMdd'), 'yyyy-MM-dd') ,-1) 
  •  Tags:  
  • Related