I want to capture multiple occurrences from one row with multi-line strings and label it with position number using regexp. I’m looking for the string after C- only.
Data
SELECT
‘A-abc
B-123
C-893-abcdef
D-def
Z-abc
B-123
C-891 zxcvbn
Y-def
A-abc
Q-123
C-892
H-def’ as One_Row_Multi_Line_Data FROM DUAL
Desired Result in two columns:
Item_No Position_Number
893-abcdef 1
891 zxcvbn 2
892 3
CodePudding user response:
The data value is supposed to be a single multi-line string value, not 12 separate rows? That isn't obvious in your question...
yes the data is in one dataset or one row only
You can use a hierarchical query to achieve that:
select regexp_substr(data, '^C-(. )$', 1, level, 'm') item_number,
level as position_number
from your_table
connect by level <= regexp_count(data, '^C-', 1, 'm')
That uses the level from from each iteration to pick out the right matching line from the string, and stops when it reaches the count of the number of lines in that string.
But if there is more than one row in the table to be processed, then that will get confused; which is fixable, but it's probably then simpler to use recursive subquery factoring:
with rcte (data, lines, item_number, position_number) as (
select data,
regexp_count(data, '^C-', 1, 'm'),
regexp_substr(data, '^C-(. )$', 1, 1, 'm'),
1
from your_table
union all
select data,
lines,
regexp_substr(data, '^C-(. )$', 1, r.position_number 1, 'm'),
position_number 1
from rcte r
where r.position_number < r.lines
)
select item_number, position_number
from rcte
The anchor member:
select data,
regexp_count(data, '^C-', 1, 'm'),
regexp_substr(data, '^C-(. )$', 1, 1, 'm'),
1
from your_table
... gets the full data value, the number of relevant lines (via regexp_count()), and first item number occurrence, and sets the initial position number to 1.
The recursive member:
select data,
lines,
regexp_substr(data, '^C-(. )$', 1, r.position_number 1, 'm'),
position_number 1
from rcte r
where r.position_number < r.lines
... adds one to the position number from the previous iteration, and gets the occurrence of the pattern from the passed-along string. That stops when the last position number matches the number of lines, from the where clause.
With that single row, with one multi-line value, both get:
| ITEM_NUMBER | POSITION_NUMBER |
|---|---|
| C-893-abcdef | 1 |
| C-891 zxcvbn | 2 |
| C-892 | 3 |
