Home > Software engineering >  Find space after nth characters and split into new row
Find space after nth characters and split into new row

Time:02-01

I have a large string stored in table as a single line. I need a select query to split the large string to rows after every 100 characters and it should not split in middle of the word. Basically, the query should find a space after 100 characters and split into new line.

I have used this query, it is splitting after 100 lines, but it is breaking in the middle of words.

SELECT  REGEXP_REPLACE ( col_large_string , '(.{100})' , '\1' || CHR (10) ) AS split_to_rows
FROM tab_large_string where string_id = 1;

CodePudding user response:

You do not need (slow) regular expressions and can do it with simple (quicker) string functions.

If you want to replace spaces with newlines then:

WITH bounds ( str, end_pos ) AS (
  SELECT col_large_string,
         INSTR(col_large_string, ' ', 101)
  FROM   tab_large_string
UNION ALL
  SELECT SUBSTR(str, 1, end_pos - 1)
         || CHR(10)
         || SUBSTR(str, end_pos   1),
         INSTR(str, ' ', end_pos   101)
  FROM   bounds
  WHERE  end_pos > 0
)
SELECT str AS split_to_lines
FROM   bounds
WHERE  end_pos = 0;

and if you want to have each line in a new row then:

WITH bounds ( str, start_pos, end_pos ) AS (
  SELECT col_large_string,
         1,
         INSTR(col_large_string, ' ', 101)
  FROM   tab_large_string
UNION ALL
  SELECT str,
         end_pos   1,
         INSTR(str, ' ', end_pos   101)
  FROM   bounds
  WHERE  end_pos > 0
)
SELECT CASE end_pos
       WHEN 0
       THEN SUBSTR(str, start_pos)
       ELSE SUBSTR(str, start_pos, end_pos - start_pos)
       END AS split_to_rows
FROM   bounds;

If you do want to use regular expressions then:

SELECT REGEXP_REPLACE(
         col_large_string,
         '(.{100,}?) ',
         '\1' || CHR (10)
       ) AS split_to_lines
FROM   tab_large_string
WHERE  string_id = 1;

db<>fiddle here

CodePudding user response:

You can use this regular expression:

SELECT  REGEXP_REPLACE ( col_large_string , '((\w \s ){100})' , '\1' || CHR (10) ) AS split_to_rows
FROM tab_large_string where string_id = 1;

\w matches one or more occurrence of word character.
\s matches one or more occurrence of space character.
(\w \s ) matches a word followed by space
(\w \s ){100} then matches (a word followed by space) x100.

  •  Tags:  
  • Related