openacid / slim
- пятница, 15 марта 2019 г. в 00:16:50
Go
Unbelievably space efficient data structures in Golang.
Slim is collection of surprisingly space efficient data types, with corresponding serialisation APIs to persisting them on-disk or for transport.
SlimIndex: is a common index structure, building on top of SlimTrie.
SlimTrie is the underlying index data structure, evolved from trie.
Features:
Minimised: requires only 6 bytes per key(even less than an 8-byte pointer!!).
Stable: memory consumption is stable in various scenarios. The Worst case converges to average consumption tightly. See benchmark.
Unlimited key length:
You can have VERY long keys, without bothering yourself with any
waste of memory(and money).
Do not waste your life writing another prefix compression:).
(aws-s3 limits key length to 1024 bytes).
Memory consumption only relates to key count, not to key length.
Ordered: like btree, keys are stored in alphabetic order, but faster to access. Range-scan is supported!(under development)
Fast:
time complexity for a get is O(log(n) + k); n: key count; k: key length.
With comparison to btree, which is O(log(n) * k)(tree-like).
And golang-map is O(k)(hash-table-like).
Ready for transport:
SlimTrie has no gap between its in-memory layout and its on-disk
layout or transport presentation.
Unlike other data-structure such as the popular red-black-tree,
which is designed for in-memory use only and requires additional work to
persist or transport it.
Loosely coupled design:
index logic and data storage is completely separated.
Piece of cake using SlimTrie to index huge data.
Comparison of SlimTrie and native golang-map.
| Key count | Key length | SlimTrie: byte/key | Map: byte/key |
|---|---|---|---|
| 13107 | 512 | 7.3 | 542.2 |
| 16384 | 512 | 7.0 | 554.5 |
| 21845 | 512 | 7.2 | 544.9 |
| Key count | Key length | SlimTrie: byte/key | Map: byte/key |
|---|---|---|---|
| 16384 | 512 | 7.0 | 554.5 |
| 16384 | 1024 | 7.0 | 1066.5 |
Memory overhead per key is stable with different key length(k) and key count(n):
SlimTrie memory does not increase when key become longer.
Time(in nano second) spent on a get operation with SlimTrie, golang-map and btree by google.
Smaller is better.
| Key count | Key length | SlimTrie | Map | Btree |
|---|---|---|---|---|
| 1 | 1024 | 86.3 | 5.0 | 36.9 |
| 10 | 1024 | 90.7 | 59.7 | 99.5 |
| 100 | 1024 | 123.3 | 60.1 | 240.6 |
| 1000 | 1024 | 157.0 | 63.5 | 389.6 |
| 1000 | 512 | 152.6 | 40.0 | 363.0 |
| 1000 | 256 | 152.3 | 28.8 | 332.3 |
It is about 2.6 times faster than the btree by google.
Time(in nano second) spent on a get with different key count(n) and key length(k):
See: benchmark-get-md.
0.1.0(Current): index structure.package main
import (
"fmt"
"strings"
"github.com/openacid/slim/index"
)
type Data string
func (d Data) Read(offset int64, key string) (string, bool) {
kv := strings.Split(string(d)[offset:], ",")[0:2]
if kv[0] == key {
return kv[1], true
}
return "", false
}
func main() {
// `data` is a sample external data.
// In order to let SlimIndex be able to read data, `data` should have
// a `Read` method:
// Read(offset int64, key string) (string, bool)
data := Data("Aaron,1,Agatha,1,Al,2,Albert,3,Alexander,5,Alison,8")
// keyOffsets is a prebuilt index that stores key and its offset in data accordingly.
keyOffsets := []index.OffsetIndexItem{
{Key: "Aaron", Offset: 0},
{Key: "Agatha", Offset: 8},
{Key: "Al", Offset: 17},
{Key: "Albert", Offset: 22},
{Key: "Alexander", Offset: 31},
{Key: "Alison", Offset: 43},
}
// Create an index
st, err := index.NewSlimIndex(keyOffsets, data)
if err != nil {
fmt.Println(err)
}
// Lookup by SlimIndex
v, found := st.Get2("Alison")
fmt.Printf("key: %q\n found: %t\n value: %q\n", "Alison", found, v)
v, found = st.Get2("foo")
fmt.Printf("key: %q\n found: %t\n value: %q\n", "foo", found, v)
// Output:
// key: "Alison"
// found: true
// value: "8"
// key: "foo"
// found: false
// value: ""
}Install
go get github.com/openacid/slimAll dependency packages are included in vendor/ dir.
Prerequisites
For users (who'd like to build cool stuff with slim):
Nothing.
For contributors (who'd like to make slim better):
dep:
for dependency management.protobuf:
for re-compiling *.proto file if on-disk data structure changes.Max OS X:
brew install dep protobufOn other platforms you can read more: dep-install, protoc-install.
Feedback and Contributions are greatly appreciated.
At this stage, the maintainers are most interested in feedback centred on:
slim supports well, or doesn't support at all?Let us know by filing an issue, describing what you did or wanted to do, what you expected to happen, and what actually happened:
Or other type of issue.
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE file for details.