mailru / easyjson
- суббота, 26 марта 2016 г. в 02:11:30
Go
Fast JSON serializer for golang.
easyjson allows to (un-)marshal JSON golang structs without the use of reflection by generating marshaller code.
One of the aims of the library is to keep generated code simple enough so that it can be easily optimized or fixed. Another goal is to provide users with ability to customize the generated code not available in 'encoding/json', such as generating snake_case names or enabling 'omitempty' behavior by default.
go get github.com/mailru/easyjson/...
easyjson -all <file>.go
This will generate <file>_easyjson.go
with marshaller/unmarshaller methods for structs. GOPATH
variable needs to be set up correctly, since the generation invokes a go run
on a temporary file (this is a really convenient approach to code generation borrowed from https://github.com/pquerna/ffjson).
Usage of easyjson:
-all
generate un-/marshallers for all structs in a file
-build_tags string
build tags to add to generated file
-leave_temps
do not delete temporary files
-omit_empty
omit empty fields by default
-snake_case
use snake_case names instead of CamelCase by default
-stubs
only generate stubs for marshallers/unmarshallers methods
Using -all
will generate (un-)marshallers for all structs in the file. By default, structs need to have a line beginning with easyjson:json
in their docstring, e.g.:
//easyjson:json
struct A{}
-snake_case
tells easyjson to generate snake_case field names by default (unless explicitly overriden by a field tag). The CamelCase to snake_case conversion algorithm should work in most cases (e.g. HTTPVersion will be converted to http_version). There can be names like JSONHTTPRPC where the conversion will return an unexpected result (jsonhttprpc without underscores), but such names require a dictionary to do the conversion and may be ambiguous.
-build_tags
will add corresponding build tag line for the generated file.
easyjson generates MarshalJSON/UnmarshalJSON methods that are compatible with interfaces from 'encoding/json'. They are usable with 'json.Marshal' and 'json.Unmarshal' functions, however actually using those will result in significantly worse performance compared to custom interfaces.
MarshalEasyJSON
/ UnmarshalEasyJSON
methods are generated for faster parsing using custom Lexer/Writer structs (jlexer.Lexer
and jwriter.Writer
). The method signature is defined in easyjson.Marshaler
/ easyjson.Unmarshaler
interfaces. These interfaces allow to avoid using any unnecessary reflection or type assertions during parsing. Functions can be used manually or with easyjson.Marshal<...>
and easyjson.Unmarshal<...>
helper methods.
jwriter.Writer
struct in addition to function for returning the data as a single slice also has methods to return the size and to send the data to an io.Writer
. This is aimed at a typical HTTP use-case, when you want to know the Content-Length
before actually starting to send the data.
There are helpers in the top-level package for marhsaling/unmarshaling the data using custom interfaces to and from writers, including a helper for http.ResponseWriter
.
If easyjson.Marshaler
/ easyjson.Unmarshaler
interfaces are implemented by a type involved in JSON parsing, the type will be marshaled/unmarshaled using these methods. easyjson.Optional
interface allows for a custom type to integrate with 'omitempty' logic.
As an example, easyjson includes an easyjson.RawMessage
analogous to json.RawMessage
.
Also, there are 'optional' wrappers for primitive types in easyjson/opt
package. These are useful in the case when it is necessary to distinguish between missing and default value for the type. Wrappers allow to avoid pointers and extra heap allocations in such cases.
The library uses a custom buffer which allocates data in increasing chunks (128-32768 bytes). Chunks of 512 bytes and larger are reused with the help of sync.Pool
. The maximum size of a chunk is bounded to reduce redundancy in memory allocation and to make the chunks more reusable in the case of large buffer sizes.
The buffer code is in easyjson/buffer
package the exact values can be tweaked by a buffer.Init()
call before the first serialization.
Most benchmarks were done using a sample 13kB JSON (9k if serialized back trimming the whitespace) from https://dev.twitter.com/rest/reference/get/search/tweets. The sample is very close to real-world data, quite structured and contains a variety of different types.
For small request benchmarks, an 80-byte portion of the regular sample was used.
For large request marshalling benchmarks, a struct containing 50 regular samples was used, making a ~500kB output JSON.
Benchmarks are available in the repository and are run on 'make'.
easyjson seems to be 5-6 times faster than the default json serialization for unmarshalling, 3-4 times faster for non-concurrent marshalling. Concurrent marshalling is 6-7x faster if marshalling to a writer.
easyjson uses the same approach for code generation as ffjson, but a significantly different approach to lexing and generated code. This allows easyjson to be 2-3x faster for unmarshalling and 1.5-2x faster for non-concurrent unmarshalling.
ffjson seems to behave weird if used concurrently: for large request pooling hurts performance instead of boosting it, it also does not quite scale well. These issues are likely to be fixable and until that comparisons might vary from version to version a lot.
easyjson is similar in performance for small requests and 2-5x times faster for large ones if used with a writer.
github.com/ugorji/go/codec library provides compile-time helpers for JSON generation. In this case, helpers are not exactly marshallers as they are encoding-independent.
easyjson is generally ~2x faster for non-concurrent benchmarks and about 3x faster for concurrent encoding (without marshalling to a writer). Unsafe option for generated helpers was used.
As an attempt to measure marshalling performance of 'go/codec' (as opposed to allocations/memcpy/writer interface invocations), a benchmark was done with resetting lenght of a byte slice rather than resetting the whole slice to nil. However, the optimization in this exact form may not be applicable in practice, since the memory is not freed between marshalling operations.
ujson is using C code for parsing, so it is interesting to see how plain golang compares to that. It is imporant to note that the resulting object for python is slower to access, since the library parses JSON object into dictionaries.
easyjson seems to be slightly faster for unmarshalling (finally!) and 2-3x faster for marshalling.
The data was measured on 4 February, 2016 using current ffjson and golang 1.6. Data for go/codec was added on 4 March 2016, benchmarked on the same machine.
lib | json size | MB/s | allocs/op | B/op |
---|---|---|---|---|
standard | regular | 22 | 218 | 10229 |
standard | small | 9.7 | 14 | 720 |
-------- | ----------- | ------ | ----------- | ------- |
easyjson | regular | 125 | 128 | 9794 |
easyjson | small | 67 | 3 | 128 |
-------- | ----------- | ------ | ----------- | ------- |
ffjson | regular | 66 | 141 | 9985 |
ffjson | small | 17.6 | 10 | 488 |
-------- | ----------- | ------ | ----------- | ------- |
codec | regular | 55 | 434 | 19299 |
codec | small | 29 | 7 | 336 |
-------- | ----------- | ------ | ----------- | ------- |
ujson | regular | 103 | N/A | N/A |
lib | json size | MB/s | allocs/op | B/op |
---|---|---|---|---|
standard | regular | 75 | 9 | 23256 |
standard | small | 32 | 3 | 328 |
standard | large | 80 | 17 | 1.2M |
---------- | ----------- | ------ | ----------- | ------- |
easyjson | regular | 213 | 9 | 10260 |
easyjson* | regular | 263 | 8 | 742 |
easyjson | small | 125 | 1 | 128 |
easyjson | large | 212 | 33 | 490k |
easyjson* | large | 262 | 25 | 2879 |
---------- | ----------- | ------ | ----------- | ------- |
ffjson | regular | 122 | 153 | 21340 |
ffjson** | regular | 146 | 152 | 4897 |
ffjson | small | 36 | 5 | 384 |
ffjson** | small | 64 | 4 | 128 |
ffjson | large | 134 | 7317 | 818k |
ffjson** | large | 125 | 7320 | 827k |
---------- | ----------- | ------ | ----------- | ------- |
codec | regular | 80 | 17 | 33601 |
codec*** | regular | 108 | 9 | 1153 |
codec | small | 42 | 3 | 304 |
codec*** | small | 56 | 1 | 48 |
codec | large | 73 | 483 | 2.5M |
codec*** | large | 103 | 451 | 66007 |
---------- | ----------- | ------ | ----------- | ------- |
ujson | regular | 92 | N/A | N/A |
* marshalling to a writer,
** using ffjson.Pool()
,
*** reusing output slice instead of resetting it to nil
lib | json size | MB/s | allocs/op | B/op |
---|---|---|---|---|
standard | regular | 252 | 9 | 23257 |
standard | small | 124 | 3 | 328 |
standard | large | 289 | 17 | 1.2M |
---------- | ----------- | ------- | ----------- | ------- |
easyjson | regular | 792 | 9 | 10597 |
easyjson* | regular | 1748 | 8 | 779 |
easyjson | small | 333 | 1 | 128 |
easyjson | large | 718 | 36 | 548k |
easyjson* | large | 2134 | 25 | 4957 |
---------- | ----------- | ------ | ----------- | ------- |
ffjson | regular | 301 | 153 | 21629 |
ffjson** | regular | 707 | 152 | 5148 |
ffjson | small | 62 | 5 | 384 |
ffjson** | small | 282 | 4 | 128 |
ffjson | large | 438 | 7330 | 1.0M |
ffjson** | large | 131 | 7319 | 820k |
---------- | ----------- | ------ | ----------- | ------- |
codec | regular | 183 | 17 | 33603 |
codec*** | regular | 671 | 9 | 1157 |
codec | small | 147 | 3 | 304 |
codec*** | small | 299 | 1 | 48 |
codec | large | 190 | 483 | 2.5M |
codec*** | large | 752 | 451 | 77574 |
* marshalling to a writer,
** using ffjson.Pool()
,
*** reusing output slice instead of resetting it to nil