Similar to ANTLR (and unlike Bison/Yacc), Lox separates the grammar from the user-written code. Unlike ANTLR, Lox analyzes both the grammar and the user-written code to inform how the parser should be generated. It matches productions (grammar/lox) to actions (user-written/Go) and uses these relations to type-check action parameters and return-values. As a consequence the user-written Go code must fulfill certain requirements before a parser can be generated.
It is important to always use the latest Lox release, especially after
updating the Go toolchain. If lox
returns weird/unexplainable Go-related
errors, it is likely that your version of lox
is too old.
Lox uses non-idiomatic prefixes to prevent symbol collisions between generated
and user-written Go code. More specifically, it reserves the on_
prefix
(explained bellow) and the single underscore _
prefix (e.g. _TokenToString
).
Lox reserves the right to add more symbols using the _
prefix in the future.
The parser package must define a type called Token
which Lox will use to
represent tokens. Token
must also be the same type returned by the lexer. If
your project is using
simplelexer (recommended for
beginners), then Token
must be an alias to simplelexer’s Token
type:
type Token = simplelexer.Token
The parser package must also define a parser type. The name is not important,
but it must embed the type lox
which will be generated by the lox tool:
type myParser struct {
lox
// Other fields.
}
Embedding lox
marks the type as the parser type. Lox will look for actions in
this type, and it will also generate methods in this type.
Each grammar production must have a corresponding Go action method to be
executed by the parser when it reduces that production. The production method
must be defined on the parser type. The name of the method must
follow the pattern on_<rule>
or on_<rule>__<suffix>
where <rule>
is the
name of the rule that defines the production and <suffix>
is an optional
string to allow multiple action methods for the same rule. The actual value of
<suffix>
is not important, and is ignored by Lox.
The action method must return a single result. If a rule has more than one action methods, they must return the same type. Each production term must correspond to an action parameter. The Go type of the rule or token referenced by the term must be assignable to its corresponding parameter type.
For example:
statement = ID '=' expr
| 'call' ID
func (p *myParser) on_statement__assign(id Token, _ Token, e Expr) Statement {
return &AssignStat{id, e}
}
func (p *myParser) on_statement__call(_ Token, id Token) Statement {
return &CallStat{id}
}
The method on_statement__assign
matches the production statement = ID '='
expr
because:
on_statement
matches the rule statement
(the suffix __assign
is
ignored).id Token
matches the term ID
(all tokens match the Token
type)._ Token
matches the term =
(the parameter name is not important, just the
type).e Expr
matches the term expr
(assuming that there is a rule expr
whose
actions return Expr
).A production must match a single action method.
For example:
statement = 'ID' = 'expr'
func (p *myParser) on_statement__1(id Token, _ Token, e Expr) Statement {
return &AssignStat{id, e}
}
func (p *myParser) on_statement__2(id, _, e any) Statement {
return &AssignStat{id.(Token), e.(Expr)}
}
In this example, lox
will produce an error because both on_statement__1
and
on_statement__2
could be used as the action method for the one statement
production.
If your parser type defines a method called _onBounds
, the
generated parser will call it once for every reduce artifact with information
defining its lexical boundaries in the form of the start and end tokens. This
can be used to store source location information in the AST, for example.
_onBounds
, if specified, must have the following signature:
func (p *yourParser) _onBounds(r any, begin, end Token) {
// ...
}
Where r
is the reduce artifact, and begin
and end
are its boundary tokens.
You must run lox
to regenerate the parser after defining _onBounds
or it
will not be called.
Check out
Bolox
for an example of how _onBounds
can be used to associate source location
information with ASTs for error logging purposes.
Lox-generated files follows the following pattern: *.gen.go
(e.g.
parser.gen.go
). Generated code includes the parser, the lexer state machine
and other supporting types and functions. This section documents the code that
is available for you to reference/use.
Lox generates the lox
type which includes the data-structures used to run the
parser. You must define a struct
in the same package which embeds this type.
This tells Lox that this struct
is the parser. Lox will add methods
implementing the parser to this type.
Lox generates the interface _Lexer
as follows:
type _Lexer interface {
ReadToken() (Token, int)
}
_Lexer
defines the interface required by the parser. Fun fact: Lox does not
generate an actual lexer, it generates the state machine for a lexer. You are
responsible for providing a _Lexer
implementation.
simplelexer is the
Lexer implementation used by the examples included with Lox, and should be
sufficient for most projects. It is small and simple enough that you could just
copy to your project instead of using it as an external dependency.