cswinter / LocustDB
- среда, 11 июля 2018 г. в 02:07:36
Rust
Massively parallel, high performance analytics database that will rapidly devour all of your data.
An experimental analytics database aiming to set a new standard for query performance on commodity hardware. See How to Analyze Billions of Records per Second on a Single Desktop PC for an overview of current capabilities.
git clone https://github.com/cswinter/LocustDB.git
cd LocustDB
RUSTFLAGS="-Ccodegen-units=1" CARGO_INCREMENTAL=0 cargo +nightly run --release --bin repl -- test_data/nyc-taxi.csv.gz
Instead of test_data/nyc-taxi.csv.gz
, you can also pass a path to any other .csv
or gzipped .csv.gz
file. The first line of the file will need to contain the names for each column. The datatypes for each column will be derived automatically, but things might break for columns that contain a mixture of numbers/strings/empty entries.
You can pass the magic strings nyc100m
or nyc
to load the first 5 files (100m records) or full 1.46 billion taxi rides dataset which you will need to download first (for the full dataset, you will need about 120GB of disk space and 60GB of RAM).
cargo +nightly test
RUSTFLAGS="-Ccodegen-units=1" CARGO_INCREMENTAL=0 cargo +nightly bench
A vision for LocustDB.
Query performance for analytics workloads is best-in-class on commodity hardware, both for data cached in memory and for data read from disk.
LocustDB automatically achieves spectacular compression ratios, has minimal indexing overhead, and requires less machines to store the same amount of data than any other system. The trade-off between performance and storage efficiency is configurable.
New data is available for queries within seconds.
LocustDB scales seamlessly from a single machine to large clusters.
LocustDB should be usable with minimal configuration or schema-setup as:
Until LocustDB is production ready these are distractions at best, if not wholly incompatible with the main goals.
LocustDB does not efficiently execute queries inserting or operating on small amounts of data.
LocustDB does not run on GPUs.