Similar to ANTLR (and unlike Bison/Yacc), Lox separates the grammar from the user-written code. Unlike ANTLR, Lox analyzes both the grammar and the user-written code to inform how the parser should be generated. It matches productions (grammar/lox) to actions (user-written/Go) and uses these relations to type-check action parameters and return-values. As a consequence the user-written Go code must fulfill certain requirements before a parser can be generated.
It is important to always use the latest Lox release, especially after
updating the Go toolchain. If lox returns weird/unexplainable Go-related
errors, it is likely that your version of lox is too old.
Lox uses non-idiomatic prefixes to prevent symbol collisions between generated
and user-written Go code. More specifically, it reserves the on_ prefix
(explained bellow) and the single underscore _ prefix (e.g. _TokenToString).
Lox reserves the right to add more symbols using the _ prefix in the future.
The parser package must define a type called Token which Lox will use to
represent tokens. Token must also be the same type returned by the lexer. If
your project is using
simplelexer (recommended for
beginners), then Token must be an alias to simplelexer’s Token type:
type Token = simplelexer.Token
The parser package must also define a parser type. The name is not important,
but it must embed the type lox which will be generated by the lox tool:
type myParser struct {
lox
// Other fields.
}
Embedding lox marks the type as the parser type. Lox will look for actions in
this type, and it will also generate methods in this type.
Each grammar production must have a corresponding Go action method to be
executed by the parser when it reduces that production. The production method
must be defined on the parser type. The name of the method must
follow the pattern on_<rule> or on_<rule>__<suffix> where <rule> is the
name of the rule that defines the production and <suffix> is an optional
string to allow multiple action methods for the same rule. The actual value of
<suffix> is not important, and is ignored by Lox.
The action method must return a single result. If a rule has more than one action methods, they must return the same type. Each production term must correspond to an action parameter. The Go type of the rule or token referenced by the term must be assignable to its corresponding parameter type.
For example:
statement = ID '=' expr
| 'call' ID
func (p *myParser) on_statement__assign(id Token, _ Token, e Expr) Statement {
return &AssignStat{id, e}
}
func (p *myParser) on_statement__call(_ Token, id Token) Statement {
return &CallStat{id}
}
The method on_statement__assign matches the production statement = ID '='
expr because:
on_statement matches the rule statement (the suffix __assign is
ignored).id Token matches the term ID (all tokens match the Token type)._ Token matches the term = (the parameter name is not important, just the
type).e Expr matches the term expr (assuming that there is a rule expr whose
actions return Expr).A production must match a single action method.
For example:
statement = 'ID' = 'expr'
func (p *myParser) on_statement__1(id Token, _ Token, e Expr) Statement {
return &AssignStat{id, e}
}
func (p *myParser) on_statement__2(id, _, e any) Statement {
return &AssignStat{id.(Token), e.(Expr)}
}
In this example, lox will produce an error because both on_statement__1 and
on_statement__2 could be used as the action method for the one statement
production.
If your parser type defines a method called _onBounds, the
generated parser will call it once for every reduce artifact with information
defining its lexical boundaries in the form of the start and end tokens. This
can be used to store source location information in the AST, for example.
_onBounds, if specified, must have the following signature:
func (p *yourParser) _onBounds(r any, begin, end Token) {
// ...
}
Where r is the reduce artifact, and begin and end are its boundary tokens.
You must run lox to regenerate the parser after defining _onBounds or it
will not be called.
Check out
Bolox
for an example of how _onBounds can be used to associate source location
information with ASTs for error logging purposes.
Lox-generated files follows the following pattern: *.gen.go (e.g.
parser.gen.go). Generated code includes the parser, the lexer state machine
and other supporting types and functions. This section documents the code that
is available for you to reference/use.
Lox generates the lox type which includes the data-structures used to run the
parser. You must define a struct in the same package which embeds this type.
This tells Lox that this struct is the parser. Lox will add methods
implementing the parser to this type.
Lox generates the interface _Lexer as follows:
type _Lexer interface {
ReadToken() (Token, int)
}
_Lexer defines the interface required by the parser. Fun fact: Lox does not
generate an actual lexer, it generates the state machine for a lexer. You are
responsible for providing a _Lexer implementation.
simplelexer is the
Lexer implementation used by the examples included with Lox, and should be
sufficient for most projects. It is small and simple enough that you could just
copy to your project instead of using it as an external dependency.