Parser Reference

A parser’s purpose is to recognize a string of tokens conforming to a grammar, and to optionally transform it into something else.

Parser Section

Parser declarations are contained in a parser section signaled by the @parser keyword.

@parser

// Parser declarations

Rules

Parser rules are the fundamental building blocks of the parser. They define transformations of terminals (tokens) and non-terminals (other rules) and can themselves be referenced as terms in other rule productions, including recursively. The parser rule syntax resembles the extended-Backus-Naur form (EBNF).

The Lox rule syntax itself can be expressed concisely in Lox:

rule      = '@start'? ID '=' @list(prod, '|') NL
prod      = term_card+ qualif?
          | '@empty'

term_card = term card?
term      = ID | LITERAL | '@error' | list
list      = '@list' '(' term ',' term ')'
card      = '*' | '*!' | '+' | '?'
qualif    = '@left' '(' NUM ')'
          | '@right' '(' NUM ')'

Let’s take a closer look at the following example rule:

prod = term_card+ qualif?
     | '@empty'

prod is the name of the rule.
term_card+ qualif? and '@empty' are the rule’s productions.
term_card+ and qualif? are terms in the first production.
term_card and qualif reference other rules.
+ is a cardinality, indicating that one or more term_cards are expected. Likewise, ? indicates that zero or one qualif is expected.
'@empty' is a token, referenced by its literal form.

Notes:

Rule names must be valid Go identifiers with the following additional restrictions:
- Must not start with an underscore (_).
- Must not contain consecutive underscores.

Line Continuation

Rules are terminated by an end-of-line. The line-continuation backslash \ can be used to split a rule into multiple lines. The backlash can be omitted when the vertical bar | is the next line’s first token. In this case, the line-continuation is implicit.

For example:

term = ID | \
       LITERAL | \
       '@error' | \ 
       list

Is equivalent to:

term = ID
     | LITERAL
     | '@error'
     | list

It is idiomatic to use the latter form, and to only use the \ when a rule cannot be split at a |.

Start Rule

One, and only one rule, must be marked as the root or start rule using the @start keyword. The start rule is the parser’s goal; when the parser reduces the start rule, parsing completes successfully.

Example:

@start program = statement*

Empty Production

A production that matches the empty string - often represented in formal grammar by the Greek letter episilon (ε) - must use the @empty keyword in place of terms.

Example:

optional_number = NUMBER | @empty

In this example, the rule optional_number can be reduced by matching a NUMBER token, or by matching nothing at all (i.e., the empty string).

Term Cardinality

A term can be annotated with a cardinality modifier, which determines how many instances of the term can be matched. Cardinality is used to specify optional, repeated, or required patterns within a rule.

Optional (?)

? indicates that a term is optional.

Example:

some_rule = some_term?

// Generated by Lox:
some_term? = some_term
           | @empty

One or More (+)

+ indicates that a term can be matched one or more times.

Example:

some_rule = some_term+

// Generated by Lox:
some_term+ = some_term+ some_term
           | some_term

Zero or More (*)

* indicates that a term can be matched zero or more times.

Example:

some_rule = some_term*

// Generated by Lox:
some_term*  = some_term+?
some_term+? = some_term+
            | @empty
some_term+  = some_term+ some_term
            | some_term

List (@list)

@list(elem, sep) indicates that the term will match elem one or more times while separated by sep.

Example:

some_rule = @list(some_term, ',')

// Generated by Lox:
@list(some_term, ',') = @list(some_term, ',') ',' some_term
                      | some_term

@list can be further qualified with a ? to match a list of zero or more elements.

Precedence and Associativity

The following grammar is ambiguous. Attempting to analyze it will fail it with the error: grammar has conflicts:

expr = expr '+' expr
     | expr '*' expr
     | NUMBER

See Parser Conflicts for a general explanation about the subject.

Conflicts related to operator precedence can be resolved using precedence and associatity qualifiers:

expr = expr '+' expr  @left(1)
     | expr '*' expr  @left(2)
     | NUMBER

The @left qualifiers in the grammar tells lox that if it encounters a conflict between expr '+' expr and expr '*' expr, then the latter should take precendence over the former.

Precedence qualifiers in Lox are used to resolve ambiguities within a single rule. Precedence qualifiers have no effect if an ambiguity spans multiple rules, and resolution will likely require grammar refactoring.

When resolving conflicts in operator precedence, it’s also important to consider the associativity of the operators. Associativity determines how operators of the same precedence level are grouped in the absence of parentheses.

Left Associativity (@left)

For left-associative operators like addition and subtraction, when multiple instances of the operator appear in a row, the parser groups them from the left.

Example:

expr = expr '+' expr  @left(1)
     | NUMBER

Given the input 1 + 2 + 3, the parser processes it like this:

(1 + 2) + 3

This is due to the @left qualifier, which causes the parser to reduce the first expr before considering the next.

Right Associativity (@right)

For right-associative operators like exponentiation or assignment, the parser groups the expressions from the right.

Example:

expr = expr '^' expr  @right(1)
     | NUMBER

For the input 2 ^ 3 ^ 4, the parser groups it like this:

2 ^ (3 ^ 4)