A parser’s purpose is to recognize a string of tokens conforming to a grammar, and to optionally transform it into something else.
Parser declarations are contained in a parser section signaled by the @parser
keyword.
@parser
// Parser declarations
Parser rules are the fundamental building blocks of the parser. They define transformations of terminals (tokens) and non-terminals (other rules) and can themselves be referenced as terms in other rule productions, including recursively. The parser rule syntax resembles the extended-Backus-Naur form (EBNF).
The Lox rule syntax itself can be expressed concisely in Lox:
rule = '@start'? ID '=' @list(prod, '|') NL
prod = term_card+ qualif?
| '@empty'
term_card = term card?
term = ID | LITERAL | '@error' | list
list = '@list' '(' term ',' term ')'
card = '*' | '*!' | '+' | '?'
qualif = '@left' '(' NUM ')'
| '@right' '(' NUM ')'
Let’s take a closer look at the following example rule:
prod = term_card+ qualif?
| '@empty'
prod
is the name of the rule.term_card+ qualif?
and '@empty'
are the rule’s productions.term_card+
and qualif?
are terms in the first production.term_card
and qualif
reference other rules.+
is a cardinality, indicating that one or more term_card
s are expected.
Likewise, ?
indicates that zero or one qualif
is expected.'@empty'
is a token, referenced by its literal form.Notes:
_
).Rules are terminated by an end-of-line. The line-continuation backslash \
can
be used to split a rule into multiple lines. The backlash can be omitted when
the vertical bar |
is the next line’s first token. In this case, the
line-continuation is implicit.
For example:
term = ID | \
LITERAL | \
'@error' | \
list
Is equivalent to:
term = ID
| LITERAL
| '@error'
| list
It is idiomatic to use the latter form, and to only use the \
when a rule
cannot be split at a |
.
One, and only one rule, must be marked as the root or start rule using the
@start
keyword. The start rule is the parser’s goal; when the parser reduces
the start rule, parsing completes successfully.
Example:
@start program = statement*
A production that matches the empty string - often represented in formal grammar
by the Greek letter episilon (ε) - must use the @empty
keyword in place of
terms.
Example:
optional_number = NUMBER | @empty
In this example, the rule optional_number
can be reduced by matching a NUMBER
token, or by matching nothing at all (i.e., the empty string).
A term can be annotated with a cardinality modifier, which determines how many instances of the term can be matched. Cardinality is used to specify optional, repeated, or required patterns within a rule.
?
indicates that a term is optional.
Example:
some_rule = some_term?
// Generated by Lox:
some_term? = some_term
| @empty
+
indicates that a term can be matched one or more times.
Example:
some_rule = some_term+
// Generated by Lox:
some_term+ = some_term+ some_term
| some_term
*
indicates that a term can be matched zero or more times.
Example:
some_rule = some_term*
// Generated by Lox:
some_term* = some_term+?
some_term+? = some_term+
| @empty
some_term+ = some_term+ some_term
| some_term
@list(elem, sep)
indicates that the term will match elem
one or more times
while separated by sep
.
Example:
some_rule = @list(some_term, ',')
// Generated by Lox:
@list(some_term, ',') = @list(some_term, ',') ',' some_term
| some_term
@list
can be further qualified with a ?
to match a list of zero or more
elements.
The following grammar is ambiguous. Attempting to analyze it will fail it with
the error: grammar has conflicts
:
expr = expr '+' expr
| expr '*' expr
| NUMBER
See Parser Conflicts for a general explanation about the subject.
Conflicts related to operator precedence can be resolved using precedence and associatity qualifiers:
expr = expr '+' expr @left(1)
| expr '*' expr @left(2)
| NUMBER
The @left
qualifiers in the grammar tells lox
that if it encounters a
conflict between expr '+' expr
and expr '*' expr
, then the latter should
take precendence over the former.
Precedence qualifiers in Lox are used to resolve ambiguities within a single rule. Precedence qualifiers have no effect if an ambiguity spans multiple rules, and resolution will likely require grammar refactoring.
When resolving conflicts in operator precedence, it’s also important to consider the associativity of the operators. Associativity determines how operators of the same precedence level are grouped in the absence of parentheses.
For left-associative operators like addition and subtraction, when multiple instances of the operator appear in a row, the parser groups them from the left.
Example:
expr = expr '+' expr @left(1)
| NUMBER
Given the input 1 + 2 + 3
, the parser processes it like this:
(1 + 2) + 3
This is due to the @left
qualifier, which causes the parser to reduce the
first expr
before considering the next.
For right-associative operators like exponentiation or assignment, the parser groups the expressions from the right.
Example:
expr = expr '^' expr @right(1)
| NUMBER
For the input 2 ^ 3 ^ 4
, the parser groups it like this:
2 ^ (3 ^ 4)