dpc / rdedup
- пятница, 6 мая 2016 г. в 03:12:24
Rust
Data deduplication with compression and public key encryption.
Warning: beta quality software ahead
rdedup
is a tool providing data deduplication with compression and public key
encryption written in Rust programming language. The primary use case is storing
deduplicated and encrypted backups.
I use rdup to create backup archive, and syncthing to duplicate my backups over a lot of systems. Some of them are more trusted (desktops with disk-level encryption, firewalls, stored in the vault etc.), and some not so much (semi-personal laptops, phones etc.)
As my backups tend to contain a lot of shared data (even backups taken on different systems), it makes perfect sense to deduplicate them.
However I don't want one of my hosts being physically or remotely compromised, give access to data inside all my backups from all my systems. Existing deduplication software like ddar or zbackup provide encryption, but only symmetrical (zbackup issue, ddar issue) which means you have to share the same key on all your hosts and one compromised system gives access to all your backup data.
To fill the missing piece in my master backup plan, I've decided to write it myself using my beloved Rust programming language.
rdedup
works very much like zbackup and other deduplication software
with a little twist:
When storing data, rdedup
will split it into smaller pieces - chunks - using
rolling sum, and store each chunk under unique id (sha256 digest) in a
special format directory: repo. Then the whole backup will be described as
index: a list of digests.
Index will be stored internally just like the data itself. Recursively, this reduces each backup to one unique digest, which is saved under given name.
When restoring data, rdedup
will read the index, then restore the data, reading
each chunk listed in it.
Thanks to rolling sum chunking scheme, when saving frequently similar data, a lot of common chunks will be reused, saving space.
What makes rdedup
unique, is that every time new repo directory is created,
a pair of keys (public and secret) is generated. Public key is saved in the
storage directory in plain text, while secret key is encrypted with key
derived from a passphrase.
Every time rdedup
saves a new chunk file, it's data is encrypted using public
key so it can only be decrypted using the corresponding secret key. This way
new data can always be added, with full deduplication, while only restoring
data requires providing the passphrase to unlock the private key.
Nice little detail: rdedup
supports removing old names and no longer
needed chunks (garbage collection) without passphrase. Only the data chunks
are encrypted, making operations like garbage collection safe even on untrusted
machines.
crypto secretbox
using random nonce, and key derived from passphrase using password hashing
and random saltIf you have cargo
installed:
cargo install rdedup
If not, I highly recommend installing rustup (think pip
, npm
for Rust, only better)
See rdedup -h
for help.
Supported commands:
rdedup init
- create a repo directory with keypair used for encryption.rdedup ls
- list all stored names.rdedup store <name>
- store data read from standard input under given name.rdedup load <name>
- load data stored under given name and write it on standard outputrdedup rm <name>
- remove the given name. This by itself does not remove the data.rdedup gc
- remove any no longer reachable dataIn combination with rdup this can be used to store and restore your backup like this:
rdup -x /dev/null "$HOME" | rdedup store home
rdedup load home | rdup-up "$HOME.restored"