Given a list like this:
direct_SQL_statement ::=
directly_executable_statement semicolon
directly_executable_statement ::=
direct_SQL_data_statement
| SQL_schema_statement
| SQL_transaction_statement
| SQL_connection_statement
| SQL_session_statement
| direct_implementation_defined_statement
direct_SQL_data_statement ::=
delete_statement__searched
| direct_select_statement__multiple_rows
| insert_statement
| update_statement__searched
| truncate_table_statement
| merge_statement
| temporary_table_declaration
direct_implementation_defined_statement ::=
"!! See the Syntax Rules."
apostrophe ::=
"'"
/*
5.2 token and separator
Function
Specify lexical units (tokens and separators) that participate in SQL language.
Format
*/
token ::=
nondelimiter_token
| delimiter_token
identifier_part ::=
identifier_start
| identifier_extend
/*
identifier_start ::=
"!! See the Syntax Rules."
identifier_extend ::=
"!! See the Syntax Rules."
*/
large_object_length_token ::=
digit multiplier
Is it possible to use Perl's look-ahead assertion to break it up into individual definition list?
I tried,
perl -0777ne 'print "$&\n^^\n\n" while /(?=\w \s*::=)\w \s*::=\s*. /gs;'
but it just returned the whole thing (as if the look-ahead assertion is not working at all), while
perl -0777ne 'print "$&\n^^\n\n" while /(?=\w \s*::=)\w \s*::=\s*. ?/gs;'
comes up just too short:
direct_SQL_statement ::=
d
^^
directly_executable_statement ::=
d
^^
direct_SQL_data_statement ::=
d
^^
direct_implementation_defined_statement ::=
"
^^
I need to break it up into individual BNF definition chunks to further process, like this for the initial test data:
direct_SQL_statement ::=
directly_executable_statement semicolon
^^
directly_executable_statement ::=
direct_SQL_data_statement
| SQL_schema_statement
| SQL_transaction_statement
| SQL_connection_statement
| SQL_session_statement
| direct_implementation_defined_statement
^^
direct_SQL_data_statement ::=
delete_statement__searched
| direct_select_statement__multiple_rows
| insert_statement
| update_statement__searched
| truncate_table_statement
| merge_statement
| temporary_table_declaration
^^
direct_implementation_defined_statement ::=
"!! See the Syntax Rules."
^^
Notes,
- the above output is from the initial test data.
- The whole
A ::= Bthing is called a BNF definition. the "^^" is only for visual indication that the separation is done properly. - the
apostropheand the followingtokenare different BNF definitions and should be treated as such. The/* ... */comment should be filtered out from the output. - comments may come without empty lines surrounding them. That's the reason I need to rely on the look-ahead assertion instead of the paragraphs mode.
- The question comes as a follow up to How can EBNF or BNF be parsed?, of which the solution is "W3C EBNF doesn't end a production with a semicolon because a ::= operator comes after the LHS symbol of a new production."
- The whole file can be found at github.com/ronsavage/SQL/blob/master/sql-2016.ebnf
CodePudding user response:
Question got edited whereby there are now comments, /* ... */, to omit
With possible comments (/* ... */) that need be omitted:
perl -0777 -wnE'say for m{(.*?::=.*?)\n (?: \n | (?:/\*.*?\*/) | \z)}gsx' bnf.txt
This captures a line with ::= and all that follows it up to: more newlines, or /*...*/ (comment), or end-of-string.
Or, first remove comments then break by more-than-one lines
perl -0777 -wnE's{ (?: /\* .*? \*/ ) }{\n}gsx; say for split /\n\n /;' bnf.txt
The original post, reading files in paragraph mode. Doesn't seem suitable after the question edit since now a comment may 'connect' two definitions, which are thus paragraphs-no-more.
If there's always an empty line separating chunks of interest then can process in paragraphs
perl -00 -wne'print' file
This retains the empty line, which you appear to want to keep anyway. If not, it can be removed.
(Then curiously can evan do simply perl -00 -pe'1' file)
Otherwise, can break that string on more-than-one newline
perl -0777 -wnE'@chunks = split /\n\n /; say for @chunks' file
or, if you indeed need to just output them
perl -0777 -wnE'say for split /\n\n /' file
Empty lines between chunks are now removed.
I don't see a reason to go for a lookahead.
perl -0777 -wnE'say for /(. ?::=.*?)\n(?:\n |\z)/gs' file
