Home > Back-end >  substring a string with conditions
substring a string with conditions

Time:01-05

i have a column that is containing values as below from a tableXX

file_name
my_String1_20220103_144415.csv
my_String2_20220106_155730.csv
my_String3_20220108_153930.csv
my_String4_20220103_144470.csv
my_String5_20220103_144415.csv

My question is how to retrieve a string after the last two '_' succeeded by a point '.'

to have an output as such

file_name
my_String1.csv
my_String2.csv
my_String3.csv
my_String4.csv
my_String5.csv

I tried this query :

select substring(file_name from '(.*)_') as file_name  from tableXX

CodePudding user response:

You can use

select REGEXP_REPLACE(file_name, '^([^_]*_[^_]*).*(\.[^.]*)$', '\1\2') as file_name  from tableXX

The ^([^_]*_[^_]*).*(\.[^.]*)$ regex matches

  • ^ - start of string
  • ([^_]*_[^_]*) - Group 1 (\1 refers to this group value in the replacement pattern): any zero or more chars other than _, a _ and again zero or more non-_s
  • .* - any zero or more chars as many as possible
  • (\.[^.]*) - Group 2 (\2 refers to this group value in the replacement pattern): a . char and then zero or more chars other than . char
  • $ - end of string

Here is a test:

CREATE TABLE tableXX
    (file_name character varying)
;

INSERT INTO tableXX
    (file_name)
VALUES
    ('my_String1_20220103_144415.csv'),
    ('my_String2_20220106_155730.csv'),
    ('my_String3_20220108_153930.csv'),
    ('my_String4_20220103_144470.csv'),
    ('my_String5_20220103_144415.csv')
;

select REGEXP_REPLACE(file_name, '^([^_]*_[^_]*).*(\.[^.]*)$', '\1\2') as file_name  from tableXX

Output:

enter image description here

CodePudding user response:

The part of the strings you want to replace apparently is a datetime. You don't need a RegEx which may be slow.

Using basic string functions (2 solutions):

SELECT
    OVERLAY(file_name placing '' from (LENGTH(file_name) - 19) for 17)
    , REVERSE(OVERLAY(REVERSE(file_name) PLACING '' FROM 4 FOR 16))
FROM tableXX;

OVERLAY does not accept negative integers, this is why I used LENGTH too.

But given that you want to replace this datetime value with its specific format, if you still want to do it using a RegEx I'd suggest this:

SELECT REGEXP_REPLACE(file_name, '_[\d_] .', '.')
FROM tableXX;

Replace the part beginning with an underscore, followed by integers or underscore and ending with a dot, by a single dot.

CodePudding user response:

There is also an option without a regex:

select concat(split_part(file_name, '_', 1), 
              '_', 
              split_part(file_name, '_', 2), 
              '.', 
              split_part(file_name, '.', 2))
from the_table

Not sure which one would be faster. Regular expressions are typically quite slow, but repeated calls with split_part() might not be fast either.

  •  Tags:  
  • Related