Go Reference

Similar to ANTLR (and unlike Bison/Yacc), Lox separates the grammar from the user-written code. Unlike ANTLR, Lox analyzes both the grammar and the user-written code to inform how the parser should be generated. It matches productions (grammar/lox) to actions (user-written/Go) and uses these relations to type-check action parameters and return-values. As a consequence the user-written Go code must fulfill certain requirements before a parser can be generated.

It is important to always use the latest Lox release, especially after updating the Go toolchain. If lox returns weird/unexplainable Go-related errors, it is likely that your version of lox is too old.

Lox uses non-idiomatic prefixes to prevent symbol collisions between generated and user-written Go code. More specifically, it reserves the on_ prefix (explained bellow) and the single underscore _ prefix (e.g. _TokenToString). Lox reserves the right to add more symbols using the _ prefix in the future.

Token Type

The parser package must define a type called Token which Lox will use to represent tokens. Token must also be the same type returned by the lexer. If your project is using simplelexer (recommended for beginners), then Token must be an alias to simplelexer’s Token type:

type Token = simplelexer.Token

Parser Type

The parser package must also define a parser type. The name is not important, but it must embed the type lox which will be generated by the lox tool:

type myParser struct {
  lox

  // Other fields.
}

Embedding lox marks the type as the parser type. Lox will look for actions in this type, and it will also generate methods in this type.

Action Methods

Each grammar production must have a corresponding Go action method to be executed by the parser when it reduces that production. The production method must be defined on the parser type. The name of the method must follow the pattern on_<rule> or on_<rule>__<suffix> where <rule> is the name of the rule that defines the production and <suffix> is an optional string to allow multiple action methods for the same rule. The actual value of <suffix> is not important, and is ignored by Lox.

The action method must return a single result. If a rule has more than one action methods, they must return the same type. Each production term must correspond to an action parameter. The Go type of the rule or token referenced by the term must be assignable to its corresponding parameter type.

For example:

statement = ID '=' expr
          | 'call' ID

func (p *myParser) on_statement__assign(id Token, _ Token, e Expr) Statement {
    return &AssignStat{id, e}
}

func (p *myParser) on_statement__call(_ Token, id Token) Statement {
    return &CallStat{id}
}

The method on_statement__assign matches the production statement = ID '=' expr because:

A production must match a single action method.

For example:

statement = 'ID' = 'expr'
func (p *myParser) on_statement__1(id Token, _ Token, e Expr) Statement {
    return &AssignStat{id, e}
}
func (p *myParser) on_statement__2(id, _, e any) Statement {
    return &AssignStat{id.(Token), e.(Expr)}
}

In this example, lox will produce an error because both on_statement__1 and on_statement__2 could be used as the action method for the one statement production.

_onBounds

If your parser type defines a method called _onBounds, the generated parser will call it once for every reduce artifact with information defining its lexical boundaries in the form of the start and end tokens. This can be used to store source location information in the AST, for example.

_onBounds, if specified, must have the following signature:

func (p *yourParser) _onBounds(r any, begin, end Token) {
    // ...
}

Where r is the reduce artifact, and begin and end are its boundary tokens.

You must run lox to regenerate the parser after defining _onBounds or it will not be called.

Check out Bolox for an example of how _onBounds can be used to associate source location information with ASTs for error logging purposes.

Generated Code

Lox-generated files follows the following pattern: *.gen.go (e.g. parser.gen.go). Generated code includes the parser, the lexer state machine and other supporting types and functions. This section documents the code that is available for you to reference/use.

lox

Lox generates the lox type which includes the data-structures used to run the parser. You must define a struct in the same package which embeds this type. This tells Lox that this struct is the parser. Lox will add methods implementing the parser to this type.

_Lexer

Lox generates the interface _Lexer as follows:

type _Lexer interface {
	ReadToken() (Token, int)
}

_Lexer defines the interface required by the parser. Fun fact: Lox does not generate an actual lexer, it generates the state machine for a lexer. You are responsible for providing a _Lexer implementation. simplelexer is the Lexer implementation used by the examples included with Lox, and should be sufficient for most projects. It is small and simple enough that you could just copy to your project instead of using it as an external dependency.