i have a column that is containing values as below from a tableXX
| file_name |
|---|
| my_String1_20220103_144415.csv |
| my_String2_20220106_155730.csv |
| my_String3_20220108_153930.csv |
| my_String4_20220103_144470.csv |
| my_String5_20220103_144415.csv |
My question is how to retrieve a string after the last two '_' succeeded by a point '.'
to have an output as such
| file_name |
|---|
| my_String1.csv |
| my_String2.csv |
| my_String3.csv |
| my_String4.csv |
| my_String5.csv |
I tried this query :
select substring(file_name from '(.*)_') as file_name from tableXX
CodePudding user response:
You can use
select REGEXP_REPLACE(file_name, '^([^_]*_[^_]*).*(\.[^.]*)$', '\1\2') as file_name from tableXX
The ^([^_]*_[^_]*).*(\.[^.]*)$ regex matches
^- start of string([^_]*_[^_]*)- Group 1 (\1refers to this group value in the replacement pattern): any zero or more chars other than_, a_and again zero or more non-_s.*- any zero or more chars as many as possible(\.[^.]*)- Group 2 (\2refers to this group value in the replacement pattern): a.char and then zero or more chars other than.char$- end of string
Here is a test:
CREATE TABLE tableXX
(file_name character varying)
;
INSERT INTO tableXX
(file_name)
VALUES
('my_String1_20220103_144415.csv'),
('my_String2_20220106_155730.csv'),
('my_String3_20220108_153930.csv'),
('my_String4_20220103_144470.csv'),
('my_String5_20220103_144415.csv')
;
select REGEXP_REPLACE(file_name, '^([^_]*_[^_]*).*(\.[^.]*)$', '\1\2') as file_name from tableXX
Output:
CodePudding user response:
The part of the strings you want to replace apparently is a datetime. You don't need a RegEx which may be slow.
Using basic string functions (2 solutions):
SELECT
OVERLAY(file_name placing '' from (LENGTH(file_name) - 19) for 17)
, REVERSE(OVERLAY(REVERSE(file_name) PLACING '' FROM 4 FOR 16))
FROM tableXX;
OVERLAYdoes not accept negative integers, this is why I usedLENGTHtoo.
But given that you want to replace this datetime value with its specific format, if you still want to do it using a RegEx I'd suggest this:
SELECT REGEXP_REPLACE(file_name, '_[\d_] .', '.')
FROM tableXX;
Replace the part beginning with an underscore, followed by integers or underscore and ending with a dot, by a single dot.
CodePudding user response:
There is also an option without a regex:
select concat(split_part(file_name, '_', 1),
'_',
split_part(file_name, '_', 2),
'.',
split_part(file_name, '.', 2))
from the_table
Not sure which one would be faster. Regular expressions are typically quite slow, but repeated calls with split_part() might not be fast either.

