I am having data in a column like 'EARTH PLANET-ABC', 'MARS PLANET - ABC', 'JUPITER -ABC', 'VENUS- ABC' etc. I need to return the values 'EARTH PLANET', 'MARS', 'JUPITER', VENUS etc whenver the column has ABC at the end of the column. I tried REPLACE(COLUMN_NAME,'ABC',''). It removed the ABC from the entries in the column but it still returns the '-' and the space before/front of the ABC in the entry. I want to see EARTH PLANET', 'MARS', 'JUPITER', VENUS but I am getting 'EARTH PLANET-','MARS PLANET - ','JUPITER -','VENUS-' etc instead. Would a REGEXP help in this case?
CodePudding user response:
SELECT SPLIT_PART(column,'-',1)::varchar
This splits your field by the hyphen and then gives you the first piece of it. Seems like that's what you're looking for. You might also need to trim the result to take care of any spaces that remain.
CodePudding user response:
some experimenting gives us these part answers, ENDSWITH look promising, but I used longer process in the end:
SELECT
column1,
ENDSWITH(trim(column1), 'ABC') as end_with_abc,
SPLIT(column1, '-') as s,
array_size(s) AS len_s,
trim(get(s, len_s -1)) as last,
charindex('-', REVERSE(column1), 3) as last_token,
length(column1) - last_token as pos,
substr(column1,1,pos) as prefix,
iff(len_s > 1 AND last = 'ABC', prefix, column1) as answer
FROM values
('EARTH PLANET-ABC'),
('MARS PLANET - ABC'),
('JUPITER -ABC'),
('VENUS- ABC'),
('VENUS_BUT_NOT_DEF'),
('VENUS - DEF');
gives:
| COLUMN1 | END_WITH_ABC | S | LEN_S | LAST | LAST_TOKEN | POS | PREFIX | ANSWER |
|---|---|---|---|---|---|---|---|---|
| EARTH PLANET-ABC | TRUE | [ "EARTH PLANET", "ABC" ] | 2 | ABC | 4 | 12 | EARTH PLANET | EARTH PLANET |
| MARS PLANET - ABC | TRUE | [ "MARS PLANET ", " ABC" ] | 2 | ABC | 5 | 12 | MARS PLANET | MARS PLANET |
| JUPITER -ABC | TRUE | [ "JUPITER ", "ABC" ] | 2 | ABC | 4 | 8 | JUPITER | JUPITER |
| VENUS- ABC | TRUE | [ "VENUS", " ABC" ] | 2 | ABC | 5 | 5 | VENUS | VENUS |
| VENUS_BUT_NOT_DEF | FALSE | [ "VENUS_BUT_NOT_DEF" ] | 1 | VENUS_BUT_NOT_DEF | 0 | 17 | VENUS_BUT_NOT_DEF | VENUS_BUT_NOT_DEF |
| VENUS - DEF | FALSE | [ "VENUS ", " DEF" ] | 2 | DEF | 5 | 6 | VENUS | VENUS - DEF |
which can be compressed down to:
SELECT
column1,
SPLIT(column1, '-') as s,
iff(array_size(s) > 1 AND trim(get(s, array_size(s) -1)) = 'ABC', substr(column1, 1, length(column1) - charindex('-', REVERSE(column1), 3)), column1) as answer
FROM values
('EARTH PLANET-ABC'),
('MARS PLANET - ABC'),
('JUPITER -ABC'),
('VENUS- ABC'),
('VENUS_BUT_NOT_DEF'),
('VENUS - DEF');
or if you like super-compressed:
SELECT
column1,
iff(array_size(SPLIT(column1, '-')) > 1 AND trim(get(SPLIT(column1, '-'), array_size(SPLIT(column1, '-')) -1)) = 'ABC', substr(column1, 1, length(column1) - charindex('-', REVERSE(column1), 3)), column1) as answer
FROM values
('EARTH PLANET-ABC'),
('MARS PLANET - ABC'),
('JUPITER -ABC'),
('VENUS- ABC'),
('VENUS_BUT_NOT_DEF'),
('VENUS - DEF');
| COLUMN1 | ANSWER |
|---|---|
| EARTH PLANET-ABC | EARTH PLANET |
| MARS PLANET - ABC | MARS PLANET |
| JUPITER -ABC | JUPITER |
| VENUS- ABC | VENUS |
| VENUS_BUT_NOT_DEF | VENUS_BUT_NOT_DEF |
| VENUS - DEF | VENUS - DEF |
OR
you can solve with REGEXP_SUBSTR:
SELECT
column1 as orig,
COALESCE(regexp_substr(column1, '^(.*)-\\s*ABC\\s*$', 1,1,'e'), column1) as answer
FROM values
('EARTH PLANET-ABC'),
('MARS PLANET - ABC'),
('JUPITER -ABC'),
('VENUS- ABC'),
('VENUS_BUT_NOT_DEF'),
('VENUS - DEF');
giving:
| ORIG | ANSWER |
|---|---|
| EARTH PLANET-ABC | EARTH PLANET |
| MARS PLANET - ABC | MARS PLANET |
| JUPITER -ABC | JUPITER |
| VENUS- ABC | VENUS |
| VENUS_BUT_NOT_DEF | VENUS_BUT_NOT_DEF |
| VENUS - DEF | VENUS - DEF |
