Home > Mobile >  Remove certain characters in a column Snowflake
Remove certain characters in a column Snowflake

Time:01-21

I am having data in a column like 'EARTH PLANET-ABC', 'MARS PLANET - ABC', 'JUPITER -ABC', 'VENUS- ABC' etc. I need to return the values 'EARTH PLANET', 'MARS', 'JUPITER', VENUS etc whenver the column has ABC at the end of the column. I tried REPLACE(COLUMN_NAME,'ABC',''). It removed the ABC from the entries in the column but it still returns the '-' and the space before/front of the ABC in the entry. I want to see EARTH PLANET', 'MARS', 'JUPITER', VENUS but I am getting 'EARTH PLANET-','MARS PLANET - ','JUPITER -','VENUS-' etc instead. Would a REGEXP help in this case?

CodePudding user response:

SELECT SPLIT_PART(column,'-',1)::varchar

This splits your field by the hyphen and then gives you the first piece of it. Seems like that's what you're looking for. You might also need to trim the result to take care of any spaces that remain.

CodePudding user response:

some experimenting gives us these part answers, ENDSWITH look promising, but I used longer process in the end:

SELECT
    column1,
    ENDSWITH(trim(column1), 'ABC') as end_with_abc,
    SPLIT(column1, '-') as s,
    array_size(s) AS len_s,
    trim(get(s, len_s -1)) as last,
    charindex('-', REVERSE(column1), 3) as last_token,
    length(column1) - last_token as pos,
    substr(column1,1,pos) as prefix,
    iff(len_s > 1 AND last = 'ABC', prefix, column1) as answer
FROM values 
    ('EARTH PLANET-ABC'), 
    ('MARS PLANET - ABC'), 
    ('JUPITER -ABC'), 
    ('VENUS- ABC'),
    ('VENUS_BUT_NOT_DEF'),
    ('VENUS - DEF');

gives:

COLUMN1 END_WITH_ABC S LEN_S LAST LAST_TOKEN POS PREFIX ANSWER
EARTH PLANET-ABC TRUE [ "EARTH PLANET", "ABC" ] 2 ABC 4 12 EARTH PLANET EARTH PLANET
MARS PLANET - ABC TRUE [ "MARS PLANET ", " ABC" ] 2 ABC 5 12 MARS PLANET MARS PLANET
JUPITER -ABC TRUE [ "JUPITER ", "ABC" ] 2 ABC 4 8 JUPITER JUPITER
VENUS- ABC TRUE [ "VENUS", " ABC" ] 2 ABC 5 5 VENUS VENUS
VENUS_BUT_NOT_DEF FALSE [ "VENUS_BUT_NOT_DEF" ] 1 VENUS_BUT_NOT_DEF 0 17 VENUS_BUT_NOT_DEF VENUS_BUT_NOT_DEF
VENUS - DEF FALSE [ "VENUS ", " DEF" ] 2 DEF 5 6 VENUS VENUS - DEF

which can be compressed down to:

SELECT
    column1,
    SPLIT(column1, '-') as s,
    iff(array_size(s) > 1 AND trim(get(s, array_size(s) -1)) = 'ABC', substr(column1, 1, length(column1) - charindex('-', REVERSE(column1), 3)), column1) as answer
FROM values 
    ('EARTH PLANET-ABC'), 
    ('MARS PLANET - ABC'), 
    ('JUPITER -ABC'), 
    ('VENUS- ABC'),
    ('VENUS_BUT_NOT_DEF'),
    ('VENUS - DEF');

or if you like super-compressed:

SELECT
    column1,
    iff(array_size(SPLIT(column1, '-')) > 1 AND trim(get(SPLIT(column1, '-'), array_size(SPLIT(column1, '-')) -1)) = 'ABC', substr(column1, 1, length(column1) - charindex('-', REVERSE(column1), 3)), column1) as answer
FROM values 
    ('EARTH PLANET-ABC'), 
    ('MARS PLANET - ABC'), 
    ('JUPITER -ABC'), 
    ('VENUS- ABC'),
    ('VENUS_BUT_NOT_DEF'),
    ('VENUS - DEF');
COLUMN1 ANSWER
EARTH PLANET-ABC EARTH PLANET
MARS PLANET - ABC MARS PLANET
JUPITER -ABC JUPITER
VENUS- ABC VENUS
VENUS_BUT_NOT_DEF VENUS_BUT_NOT_DEF
VENUS - DEF VENUS - DEF

OR

you can solve with REGEXP_SUBSTR:

SELECT
    column1 as orig,
    COALESCE(regexp_substr(column1, '^(.*)-\\s*ABC\\s*$', 1,1,'e'), column1) as answer
FROM values 
    ('EARTH PLANET-ABC'), 
    ('MARS PLANET - ABC'), 
    ('JUPITER -ABC'), 
    ('VENUS- ABC'),
    ('VENUS_BUT_NOT_DEF'),
    ('VENUS - DEF');

giving:

ORIG ANSWER
EARTH PLANET-ABC EARTH PLANET
MARS PLANET - ABC MARS PLANET
JUPITER -ABC JUPITER
VENUS- ABC VENUS
VENUS_BUT_NOT_DEF VENUS_BUT_NOT_DEF
VENUS - DEF VENUS - DEF
  •  Tags:  
  • Related