Badger is an embeddable, persistent, simple and fast key-value (KV) store written in pure Go. It's meant to be a performant alternative to non-Go-based key-value stores like RocksDB.
We are currently gearing up for a v1.0 release. We recently introduced transactions which involved a major API change.To use the previous version of the APIs, please use tag v0.8. This tag can be specified via the Go dependency tool you're using.
To start using Badger, install Go 1.8 or above and run go get
:
$ go get github.com/dgraph-io/badger/...
This will retrieve the library and install the badger_info
command line
utility into your $GOBIN
path.
The top-level object in Badger is a DB
. It represents multiple files on disk
in specific directories, which contain the data for a single store.
To open your database, use the badger.Open()
function, with the appropriate
options. The Dir
and ValueDir
options are mandatory and must be
specified by the client. They can be set to the same value to simplify things.
package main
import (
"log"
"github.com/dgraph-io/badger"
)
func main() {
// Open the Badger store located in the /tmp/badger directory.
// It will be created if it doesn't exist.
opts := badger.DefaultOptions
opts.Dir := "/tmp/badger"
opts.ValueDir := "/tmp/badger"
db, err := badger.Open(opts)
if err != nil {
log.Fatal(err)
}
defer db.Close()
…
}
Please note that Badger obtains a lock on the directories so multiple processes cannot open the same database at the same time.
To start a read-only transaction, you can use the DB.View()
method:
err := db.View(func(tx *badger.Txn) error {
...
return nil
})
You cannot perform any writes or deletes within this transaction. Badger ensures that you get a consistent view of the database within this closure. Any writes that happen elsewhere after the transaction has started, will not be seen by calls made within the closure.
To start a read-write transaction, you can use the DB.Update()
method:
err := db.Update(func(tx *badger.Txn) error {
...
return nil
})
All database operations are allowed inside a read-write transaction.
Always check the return error as it will report an ErrConflict
in case of
conflict or other errors, for e.g. due to disk failures. If you return an error
within your closure it will be passed through.
The DB.View()
and DB.Update()
methods are wrappers around the
DB.NewTransaction()
and Txn.Commit()
methods (or Txn.Discard()
in case of
read-only transactions). These helper methods will start the transaction,
execute a function, and then safely discard your transaction if an error is
returned. This is the recommended way to use Badger transactions.
However, sometimes you may want to manually create and commit your
transactions. You can use the DB.NewTransaction()
function directly but
please be sure to commit or discard the transaction.
// Start a writable transaction.
txn, err := db.NewTransaction(true)
if err != nil {
return err
}
defer tx.Discard()
// Use the transaction...
err := txn.Set([]byte("answer"), []byte("42"), 0)
if err != nil {
return err
}
// Commit the transaction and check for error.
if err := txn.Commit(nil); err != nil {
return err
}
The first argument to DB.NewTransaction()
is a boolean stating if the transaction
should be writable.
Badger allows an optional callback to the Txn.Commit()
method. Normally, the
callback can be set to nil
, and the method will return after all the writes
have succeeded. However, if this callback is provided, the Txn.Commit()
method returns as soon as it has checked for any conflicts. The actual writing
to the disk happens asynchronously, and the callback is invoked once the
writing has finished, or an error has occurred. This can improve the throughput
of the application in some cases. But it also means that a transaction is not
durable until the callback has been invoked with a nil
error value.
To save a key/value pair to a bucket, use the Txn.Set()
method:
err := db.Update(func(txn *badger.Txn) error {
err := txn.Set([]byte("answer"), []byte("42"), 0)
return err
})
This will set the value of the "answer"
key to "42"
. To retrieve this
value, we can use the Txn.Get()
method:
err := db.View(func(txn *badger.Txn) error {
item, err := txn.Get([]byte("answer"))
if err != nil {
return err
}
val, err := item.Value()
if err != nil {
return err
}
fmt.Printf("The answer is: %s\n", val)
return nil
})
Txn.Get()
returns ErrKeyNotFound
if the value is not found.
Please note that values returned from Get()
are only valid while the
transaction is open. If you need to use a value outside of the transaction
then you must use copy()
to copy it to another byte slice.
Use the Txn.Delete()
method to delete a key from the bucket.
To iterate over keys, we can use an Iterator
, which can be obtained using the
Txn.NewIterator()
method.
err := db.View(func(txn *.Tx) error {
opts := DefaultIteratorOptions
opts.PrefetchSize = 10
it := txn.NewIterator(opts)
for it.Rewind(); it.Valid(); it.Next() {
item := it.Item()
k := item.Key()
v, err := item.Value()
if err != nil {
return err
}
fmt.Printf("key=%s, value=%s\n", k, v)
}
return nil
})
The iterator allows you to move to a specific point in the list of keys and move forward or backward through the keys one at a time.
By default, Badger prefetches the values of the next 100 items. You can adjust
that with the IteratorOptions.PrefetchSize
field. However, setting it to
a value higher than GOMAXPROCS (which we recommend to be 128 or higher)
shouldn’t give any additional benefits. You can also turn off the fetching of
values altogether. See section below on key-only iteration.
To iterate over a key prefix, you can combine Seek()
and ValidForPrefix()
:
db.View(func(txn *badger.Txn) error {
it := txn.NewIterator(&DefaultIteratorOptions)
prefix := []byte("1234")
for it.Seek(prefix); it.ValidForPrefix(prefix); it.Next() {
item := it.Item()
k := item.Key()
v, err := item.Value()
if err != nil {
return err
}
fmt.Printf("key=%s, value=%s\n", k, v)
}
return nil
})
Badger supports a unique mode of iteration called key-only iteration. It is
several order of magnitudes faster than regular iteration, because it involves
access to the LSM-tree only, which is usually resident entirely in RAM. To
enable key-only iteration, you need to set the IteratorOptions.PrefetchValues
field to false
. This can also be used to do sparse reads for selected keys
during an iteration, by calling item.Value()
only when required.
err := db.View(func(txn *.Tx) error {
opts := DefaultIteratorOptions
opts.PrefetchValues = false
it := txn.NewIterator(opts)
for it.Rewind(); it.Valid(); it.Next() {
item := it.Item()
k := item.Key()
fmt.Printf("key=%s\n", k)
}
return nil
})
Database backup is an open issue for v1.0 and will be coming soon.
Badger records metrics using the expvar package, which is included in the Go standard library. All the metrics are documented in y/metrics.go file.
expvar
package adds a handler in to the default HTTP server (which has to be
started explicitly), and serves up the metrics at the /debug/vars
endpoint.
These metrics can then be collected by a system like Prometheus, to get
better visibility into what Badger is doing.
- Introducing Badger: A fast key-value store written natively in Go
- Make Badger crash resilient with ALICE
- Badger vs LMDB vs BoltDB: Benchmarking key-value databases in Go
- Concurrent ACID Transactions in Badger
Badger was written with these design goals in mind:
- Write a key-value store in pure Go.
- Use latest research to build the fastest KV store for data sets spanning terabytes.
- Optimize for SSDs.
Badger’s design is based on a paper titled WiscKey: Separating Keys from Values in SSD-conscious Storage.
Feature | Badger | RocksDB | BoltDB |
---|---|---|---|
Design | LSM tree with value log | LSM tree only | B+ tree |
High RW Performance | Yes | Yes | No |
Designed for SSDs | Yes (with latest research 1) | Not specifically 2 | No |
Embeddable | Yes | Yes | Yes |
Sorted KV access | Yes | Yes | Yes |
Pure Go (no Cgo) | Yes | No | Yes |
Transactions | Yes, ACID, concurrent with SSI3 | Yes (but non-ACID) | Yes, ACID |
Snapshots | Yes | Yes | Yes |
1 The WISCKEY paper (on which Badger is based) saw big wins with separating values from keys, significantly reducing the write amplification compared to a typical LSM tree.
2 RocksDB is an SSD optimized version of LevelDB, which was designed specifically for rotating disks. As such RocksDB's design isn't aimed at SSDs.
3 SSI: Serializable Snapshot Isolation. For more details, see the blog post Concurrent ACID Transactions in Badger
We have run comprehensive benchmarks against RocksDB, Bolt and LMDB. The benchmarking code, and the detailed logs for the benchmarks can be found in the badger-bench repo. More explanation, including graphs can be found the blog posts (linked above).
Below is a list of public, open source projects that use Badger:
- Dgraph - Distributed graph database
- go-ipfs - Go client for the InterPlanetary File System (IPFS), a new hypermedia distribution protocol.
If you are using Badger in a project please send a pull request to add it to the list.
- My writes are really slow. Why?
Are you creating a new transaction for every single key update? This will lead
to very low throughput. To get best write performance, batch up multiple writes
inside a transaction using single DB.Update()
call. You could also have
multiple such DB.Update()
calls being made concurrently from multiple
goroutines.
- I don't see any disk write. Why?
If you're using Badger with SyncWrites=false
, then your writes might not be written to value log
and won't get synced to disk immediately. Writes to LSM tree are done inmemory first, before they
get compacted to disk. The compaction would only happen once MaxTableSize
has been reached. So, if
you're doing a few writes and then checking, you might not see anything on disk. Once you Close
the store, you'll see these writes on disk.
- Which instances should I use for Badger?
We recommend using instances which provide local SSD storage, without any limit on the maximum IOPS. In AWS, these are storage optimized instances like i3. They provide local SSDs which clock 100K IOPS over 4KB blocks easily.
- Are there any Go specific settings that I should use?
We highly recommend setting a high number for GOMAXPROCS, which allows Go to observe the full IOPS throughput provided by modern SSDs. In Dgraph, we have set it to 128. For more details, see this thread.
- Please use discuss.dgraph.io for questions, feature requests and discussions.
- Please use Github issue tracker for filing bugs or feature requests.
- Join .
- Follow us on Twitter @dgraphlabs.