I got this query from another post I made which uses REGEXP_SUBSTR() to pull out specific information from a string in oracle. It works good but only for small sets of data. When it comes to tables that have 300,000 records, it is very slow and I was doing some reading that instr substr might be faster. The example query is:
SELECT REGEXP_SUBSTR(value, '(^|\|)\s*24=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "24",
REGEXP_SUBSTR(value, '(^|\|)\s*35=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "35",
REGEXP_SUBSTR(value, '(^|\|)\s*47A=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "47A",
REGEXP_SUBSTR(value, '(^|\|)\s*98A=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "98A"
FROM table_name
Table example:
CREATE TABLE table_name (value ) AS
SELECT '35= 88234.00 | 47A= Shawn | 98A= This is a comment |' FROM DUAL UNION ALL
SELECT '24= 123.00 | 98A= This is a comment | 47A= Derick |' FROM DUAL
Output of query would be:
| 24 | 35 | 47A | 98A |
|---|---|---|---|
| 88234.00 | Shawn | This is a comment | |
| 123.00 | Derick | This is a comment |
Can someone give me an example of how this same query would look if I was doing instr substr instead?
Thank you.
CodePudding user response:
SELECT CASE
WHEN start_24 > 0
THEN TRIM(
SUBSTR(
value,
start_24 5,
INSTR(value, '|', start_24 5) - (start_24 5)
)
)
END AS "24",
CASE
WHEN start_35 > 0
THEN TRIM(
SUBSTR(
value,
start_35 5,
INSTR(value, '|', start_35 5) - (start_35 5)
)
)
END AS "35",
CASE
WHEN start_47a > 0
THEN TRIM(
SUBSTR(
value,
start_47a 6,
INSTR(value, '|', start_47a 6) - (start_47a 6)
)
)
END AS "47A",
CASE
WHEN start_98a > 0
THEN TRIM(
SUBSTR(
value,
start_98a 6,
INSTR(value, '|', start_98a 6) - (start_98a 6)
)
)
END AS "98A"
FROM (
SELECT value,
INSTR(value, '| 24=') AS start_24,
INSTR(value, '| 35=') AS start_35,
INSTR(value, '| 47A=') AS start_47a,
INSTR(value, '| 98A=') AS start_98a
FROM (
SELECT '| ' || value AS value FROM table_name
)
);
Which, for your sample data, outputs:
24 35 47A 98A 88234.00 Shawn This is a comment 123.00 Derick This is a comment
db<>fiddle here
CodePudding user response:
Given the data in your example it seems you could also use a procedural approach for your data extraction, but I'm sceptical if this could be faster.
The following function get24 for example extracts the columns "24" just using INSTR and SUBSTR.
CREATE OR REPLACE FUNCTION get24(value IN VARCHAR2) RETURN VARCHAR2
IS
i PLS_INTEGER;
s VARCHAR2(32767);
BEGIN
i := INSTR(value, '24= ');
IF (i <> 1) THEN
RETURN NULL;
END IF;
s := SUBSTR(value, i 4);
i := INSTR(s, ' | ');
IF (i = 0) THEN
RETURN NULL;
END IF;
RETURN SUBSTR(s, 1, i - 1);
END;
/
SELECT get24(value) "24" FROM table_name;
You could then also try using a pipelined function and do all the data extraction within the pipelined function.
