alecthomas / participle
- среда, 18 июля 2018 г. в 08:15:23
Go
A parser library for Go
The goal of this package is to provide a simple, idiomatic and elegant way of defining parsers in Go.
Participle's method of defining grammars should be familiar to any Go
programmer who has used the encoding/json
package: struct field tags define
what and how input is mapped to those same fields. This is not unusual for Go
encoders, but is unusual for a parser.
Participle parsers are recursive descent with one token of lookahead. This means that they do not support:
If your grammar contains either of these they must be eliminated by restructuring your grammar.
Unfortunately neither of these cases are currently detected automatically by Participle. This, however, is a goal.
A tutorial is available, walking through the creation of an .ini parser.
A grammar is an annotated Go structure used to both define the parser grammar, and be the AST output by the parser:
type Grammar struct {
Hello string `"hello" @Ident`
}
Note: Participle also supports named struct tags (eg.
Hello string `parser:"@Ident"`
).
A parser is constructed from a grammar and a lexer:
parser, err := participle.Build(&Grammar{})
Once constructed, the parser is applied to input to produce an AST:
ast := &Grammar{}
err := parser.ParseString("hello world", ast)
// ast == &Grammar{Hello: "world"}
@<expr>
Capture expression into the field.@@
Recursively capture using the fields own type.<identifier>
Match named lexer token.{ ... }
Match 0 or more times.( ... )
Group.[ ... ]
Optional."..."[:<identifier>]
Match the literal, optionally specifying the exact lexer token type to match.<expr> <expr> ...
Match expressions.<expr> | <expr>
Match one of the alternatives.Notes:
@<expr>
is the mechanism for capturing matches into the field.Prefixing any expression in the grammar with @
will capture matching values
for that expression into the corresponding field.
For example:
// The grammar definition.
type Grammar struct {
Hello string `@Ident`
}
// The source text to parse.
source := "world"
// After parsing, the resulting AST.
result == &Grammar{
Hello: "world",
}
For slice and string fields, each instance of @
will accumulate into the
field (including repeated patterns). Accumulation into other types is not
supported.
A successful capture match into a boolean field will set the field to true.
For integer and floating point types, a successful capture will be parsed
with strconv.ParseInt()
and strconv.ParseBool()
respectively.
Custom control of how values are captured into fields can be achieved by a
field type implementing the Capture
interface (Capture(values []string) error
).
Participle operates on tokens and thus relies on a lexer to convert character streams to tokens.
Three lexers are provided, varying in speed and flexibility. The fastest lexer
is based on the text/scanner package
but only allows tokens provided by that package. Next fastest is the regexp
lexer (lexer.Regexp()
). The slowest is currently the EBNF based lexer, but it has a large potential for optimisation through code generation.
To use your own Lexer you will need to implement two interfaces: Definition and Lexer.
The Parser's behaviour can be configured via Options.
There are several examples included in the source. Of particular note is the ini parser, which also includes a custom lexer.
Included below is a parser for the form of EBNF used by exp/ebnf
:
package main
import (
"fmt"
"os"
"github.com/alecthomas/participle"
)
type Group struct {
Expression *Expression `'(' @@ ')'`
}
type Option struct {
Expression *Expression `'[' @@ ']'`
}
type Repetition struct {
Expression *Expression `'{' @@ '}'`
}
type Literal struct {
Start string `@String`
End string `[ '…' @String ]`
}
type Term struct {
Name string `@Ident |`
Literal *Literal `@@ |`
Group *Group `@@ |`
Option *Option `@@ |`
Repetition *Repetition `@@`
}
type Sequence struct {
Terms []*Term `@@ { @@ }`
}
type Expression struct {
Alternatives []*Sequence `@@ { '|' @@ }`
}
type Expressions []*Expression
type Production struct {
Name string `@Ident '='`
Expressions Expressions `@@ { @@ } '.'`
}
type EBNF struct {
Productions []*Production `{ @@ }`
}
func main() {
parser, err := participle.Build(&EBNF{})
if err != nil { panic(err) }
ebnf := &EBNF{}
err = parser.Parse(os.Stdin, ebnf)
if err != nil { panic(err) }
json.NewEncoder(os.Stdout).Encode(ebnf)
}
One of the included examples is a complete Thrift parser (shell-style comments are not supported). This gives a convenient baseline for comparing to the PEG based pigeon, which is the parser used by go-thrift. Additionally, the pigeon parser is utilising a generated parser, while the participle parser is built at run time.
You can run the benchmarks yourself, but here's the output on my machine:
BenchmarkParticipleThrift-4 10000 221818 ns/op 48880 B/op 1240 allocs/op
BenchmarkGoThriftParser-4 2000 804709 ns/op 170301 B/op 3086 allocs/op
On a real life codebase of 47K lines of Thrift, Participle takes 200ms and go- thrift takes 630ms, which aligns quite closely with the benchmarks.