Warning: alpha/prototype quality software ahead
rdedup
is a tool providing data deduplication with compression and public key
encryption written in Rust programming language. It's useful for backups.
I use rdup to make backups, and also use syncthing to duplicate my backups over a lot of systems. Some of them are more trusted (desktops with disk-level encryption, firewalls, stored in the vault etc.), and some not so much (semi-personal laptops, phones etc.)
As my backups tend to contain a lot of shared data (even backups taken on different systems), it makes perfect sense to deduplicate them.
However I'm paranoid and I don't want one of my hosts being physically or remotely compromised, give access to data inside all my backups from all my systems. Existing deduplication software like ddar or zbackup provide encryption, but only symmetrical (zbackup issue, ddar issue) which means you have to share the same key on all your hosts and one compromised system compromises all your backups.
To fill the missing piece in my master backup plan, I've decided to write it
myself using my beloved Rust programming language. That's how rdedup
started.
rdedup
works very much like zbackup and other deduplication software
with a little twist:
- Thanks to public key cryptography, making backups.
- Everything should be synchronization friendly. Even simple Dropbox/Syncthing should work fine for data replication.
rdedup
uses a special format to use a given directory as a deduplication
storage.
When saving data, rdedup
will split it into smaller pieces (chunks) using
rolling sum algorithm, and store each chunk under unique name (sha256 digest).
Then the whole backup will be described as index: a list of chunk ids.
Index will be stored internally just like the data itself. Recursively, this reduces each backup to one unique id, which is written to name file.
When restoring data, rdedup
will read the index, then restore the data, reading
the chunks listed in index.
Thanks to this chunking scheme, when saving frequently similar data, a lot of common chunks will be reused, saving space.
What makes rdedup
unique, is that every time new storage directory is created, a pair
of keys (public and secret) is being generated. Public key is saved in the
storage directory itself in plain text, while secret key is protected with passphrase.
Every rdedup
saves a new chunk of data it's encrypted with public key so it can
only be decrypted using the corresponding secret key. This way new backups can
be created, with full deduplication, while only accessing the data requires the
private key.
The nice part is: removing old data does not require entering passphrase. Only the data itself is encrypted, making operations like garbage collecting old chunks possible on untrusted machines.
- bup method is used to split files into chunks
- sha256 is used to identify chunks
- libsodium's sealed boxes are used for encryption/decryption:
- ephemeral keys are used for sealing
- chunk digest is used as nonce
If you have cargo
installed:
cargo install rdedup
If not, I highly recommend installing rustup (think pip
, npm
for Rust, only better)
rdedup init
will create a backup
subdirectory in current directory and generate a keypair
used for encryption.
rdedup store <name>
will save any data given on standard input under given name.
rdedup load <name>
will write on standard output data previously stored under given name
In combination with rdup this can be used to store and restore your backup like this:
rdup -x /dev/null "$HOME" | rdedup store home
rdedup load home | rdup-up "$HOME.restored"
rdedup ls
will list names of all the stored data
rdedup rm <name>
will remove the given name. This by itself does not remove the data.
rdedup rm <name>
will remove the given name. This by itself does not remove the data.