Home > Blockchain >  Convert Oracle Regex to HIVE
Convert Oracle Regex to HIVE

Time:01-05

I need to convert the regex below, that works fine in Oracle, to HIVE. The regex Pattern Removes all punctuation except ('-' and '&' apostrophe, and spaces).

    select
       REGEXP_REPLACE (          
        TRANSLATE(
                    REGEXP_REPLACE( 'UNI''C...ER - BÉBI,,, DAS DE PORTUGAL, SGPS, S.A.', '[^[:alnum:] &-]', '' ),
                    'ÁÇÉÍÓÚÀÈÌÒÙÂÊÎÔÛÃÕËÜáçéíóúàèìòùâêîôûãõëü', 'ACEIOUAEIOUAEIOUAOEUaceiouaeiouaeiouaoeu'
                 ),
                    ' {2,}', ' ') SSS

from dual;

The problem is with the outputs:

  1. Oracle Output : "UNICER - BEBIDAS DE PORTUGAL SGPS SA"
  2. HIVE Output : "UNIC...ER - BEBI,,, DAS DE PORTUGAL, SGPS, S.A." (keeps '.' and ',')

I think that the problem resides on [[:alnum:]] POSIX Character Class interpretation.

Many thank's in advance.

CodePudding user response:

You can use

'[^\\p{L}\\p{N} &-]'

This regex matches any single character that is not ([^...] is a negated character class)

  • \p{L} - any Unicode letter
  • \p{N} - any Unicode digit
  • - a space
  • & - a & char
  • - - a - char.
  •  Tags:  
  • Related