Using Perl look-ahead assertion to find individual list-CodePudding

Given a list like this:

direct_SQL_statement ::=
  directly_executable_statement semicolon

directly_executable_statement ::=
    direct_SQL_data_statement
  | SQL_schema_statement
  | SQL_transaction_statement
  | SQL_connection_statement
  | SQL_session_statement
  | direct_implementation_defined_statement

direct_SQL_data_statement ::=
    delete_statement__searched
  | direct_select_statement__multiple_rows
  | insert_statement
  | update_statement__searched
  | truncate_table_statement
  | merge_statement
  | temporary_table_declaration

direct_implementation_defined_statement ::=
  "!! See the Syntax Rules."

apostrophe ::=
  "'"
/*
5.2     token and separator

Function

Specify lexical units (tokens and separators) that participate in SQL language.


Format
*/
token ::=
    nondelimiter_token
  | delimiter_token

identifier_part ::=
    identifier_start
  | identifier_extend
/*
identifier_start ::=
  "!! See the Syntax Rules."
identifier_extend ::=
  "!! See the Syntax Rules."
*/
large_object_length_token ::=
  digit  multiplier

Is it possible to use Perl's look-ahead assertion to break it up into individual definition list?

I tried,

perl -0777ne 'print "$&\n^^\n\n" while /(?=\w \s*::=)\w \s*::=\s*. /gs;'

but it just returned the whole thing (as if the look-ahead assertion is not working at all), while

perl -0777ne 'print "$&\n^^\n\n" while /(?=\w \s*::=)\w \s*::=\s*. ?/gs;'

comes up just too short:

direct_SQL_statement ::=
  d
^^

directly_executable_statement ::=
    d
^^

direct_SQL_data_statement ::=
    d
^^

direct_implementation_defined_statement ::=
  "
^^

I need to break it up into individual BNF definition chunks to further process, like this for the initial test data:

direct_SQL_statement ::=
  directly_executable_statement semicolon
^^


directly_executable_statement ::=
    direct_SQL_data_statement
  | SQL_schema_statement
  | SQL_transaction_statement
  | SQL_connection_statement
  | SQL_session_statement
  | direct_implementation_defined_statement
^^


direct_SQL_data_statement ::=
    delete_statement__searched
  | direct_select_statement__multiple_rows
  | insert_statement
  | update_statement__searched
  | truncate_table_statement
  | merge_statement
  | temporary_table_declaration
^^


direct_implementation_defined_statement ::=
  "!! See the Syntax Rules."
^^

Notes,

the above output is from the initial test data.
The whole A ::= B thing is called a BNF definition. the "^^" is only for visual indication that the separation is done properly.
the apostrophe and the following token are different BNF definitions and should be treated as such. The /* ... */ comment should be filtered out from the output.
comments may come without empty lines surrounding them. That's the reason I need to rely on the look-ahead assertion instead of the paragraphs mode.
The question comes as a follow up to How can EBNF or BNF be parsed?, of which the solution is "W3C EBNF doesn't end a production with a semicolon because a ::= operator comes after the LHS symbol of a new production."
The whole file can be found at github.com/ronsavage/SQL/blob/master/sql-2016.ebnf

CodePudding user response：

Question got edited whereby there are now comments, /* ... */, to omit

With possible comments (/* ... */) that need be omitted:

perl -0777 -wnE'say for m{(.*?::=.*?)\n (?: \n  | (?:/\*.*?\*/) | \z)}gsx' bnf.txt

This captures a line with ::= and all that follows it up to: more newlines, or /*...*/ (comment), or end-of-string.

Or, first remove comments then break by more-than-one lines

perl -0777 -wnE's{ (?: /\* .*? \*/ ) }{\n}gsx; say for split /\n\n /;' bnf.txt

The original post, reading files in paragraph mode. Doesn't seem suitable after the question edit since now a comment may 'connect' two definitions, which are thus paragraphs-no-more.

If there's always an empty line separating chunks of interest then can process in paragraphs

perl -00 -wne'print' file

This retains the empty line, which you appear to want to keep anyway. If not, it can be removed.

(Then curiously can evan do simply perl -00 -pe'1' file)

Otherwise, can break that string on more-than-one newline

perl -0777 -wnE'@chunks = split /\n\n /; say for @chunks' file

or, if you indeed need to just output them

perl -0777 -wnE'say for split /\n\n /' file

Empty lines between chunks are now removed.

I don't see a reason to go for a lookahead.

perl -0777 -wnE'say for /(. ?::=.*?)\n(?:\n |\z)/gs' file