Home > Mobile >  Parsing a newline-terminated programming language
Parsing a newline-terminated programming language

Time:01-23

Recently i've been trying to develop a custom programming language. But where the previous languages I (attempted to) make were semicolon-terminated the language I'm now making is terminated by a newline, just like Python.

The problem I've stumbled across was that while every semicolon in e.g. C is treated as a terminator of sorts, a newline in Python does not always act as a terminator.

For example:

// incorrect in c  
myfunc();;;;otherfunc();

and

# completely fine in python
myfunc()



otherfunc()

So my question is, how do i parse this? What does the backus-naur form of a language like this look like?

CodePudding user response:

I don't know about C , but in many semicolon-terminated languages, ;; is perfectly valid. Example in PHP

A simple way to express this in the abstract grammar is to allow an empty statement - that is, one made up only of optional whitespace. The parser can then accept this as valid, but emit nothing.

In the PHP parser, one of the productions for statement is this:

';' /* empty statement */ { $$ = NULL; }

The same rule could be used (mutatis mutandis) in a grammar where newline was treated as a significant token, rather than grouped into whitespace.

  •  Tags:  
  • Related