Skip to content

raphaelsc/badger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Badger GoDoc Go Report Card Build Status Appveyor Coverage Status

Badger mascot

Badger is an embeddable, persistent, simple and fast key-value (KV) store written in pure Go. It's meant to be a performant alternative to non-Go-based key-value stores like RocksDB.

Project Status

We are currently gearing up for a v1.0 release. We recently introduced transactions which involved a major API change.To use the previous version of the APIs, please use tag v0.8. This tag can be specified via the Go dependency tool you're using.

Table of Contents

Getting Started

Installing

To start using Badger, install Go 1.8 or above and run go get:

$ go get github.com/dgraph-io/badger/...

This will retrieve the library and install the badger_info command line utility into your $GOBIN path.

Opening a database

The top-level object in Badger is a DB. It represents multiple files on disk in specific directories, which contain the data for a single store.

To open your database, use the badger.Open() function, with the appropriate options. The Dir and ValueDir options are mandatory and must be specified by the client. They can be set to the same value to simplify things.

package main

import (
	"log"

	"github.com/dgraph-io/badger"
)

func main() {
  // Open the Badger store located in the /tmp/badger directory.
  // It will be created if it doesn't exist.
  opts := badger.DefaultOptions
  opts.Dir := "/tmp/badger"
  opts.ValueDir := "/tmp/badger"
  db, err := badger.Open(&opts)
  if err != nil {
	  log.Fatal(err)
  }
  defer db.Close()
	…
}

Please note that Badger obtains a lock on the directories so multiple processes cannot open the same database at the same time.

Transactions

Read-only transactions

To start a read-only transaction, you can use the DB.View() method:

err := db.View(func(tx *badger.Txn) error {
	...
	return nil
})

You cannot perform any writes or deletes within this transaction. Badger ensures that you get a consistent view of the database within this closure. Any writes that happen elsewhere after the transaction has started, will not be seen by calls made within the closure.

Read-write transactions

To start a read-write transaction, you can use the DB.Update() method:

err := db.Update(func(tx *badger.Txn) error {
	...
	return nil
})

All database operations are allowed inside a read-write transaction.

Always check the return error as it will report an ErrConflict in case of conflict or other errors, for e.g. due to disk failures. If you return an error within your closure it will be passed through.

Managing transactions manually

The DB.View() and DB.Update() methods are wrappers around the DB.NewTransaction() and Txn.Commit() methods (or Txn.Discard() in case of read-only transactions). These helper methods will start the transaction, execute a function, and then safely discard your transaction if an error is returned. This is the recommended way to use Badger transactions.

However, sometimes you may want to manually create and commit your transactions. You can use the DB.NewTransaction() function directly but please be sure to commit or discard the transaction.

// Start a writable transaction.
txn, err := db.NewTransaction(true)
if err != nil {
    return err
}
defer tx.Discard()

// Use the transaction...
err := txn.Set([]byte("answer"), []byte("42"), 0)
if err != nil {
    return err
}

// Commit the transaction and check for error.
if err := txn.Commit(nil); err != nil {
    return err
}

The first argument to DB.NewTransaction() is a boolean stating if the transaction should be writable.

Badger allows an optional callback to the Txn.Commit() method. Normally, the callback can be set to nil, and the method will return after all the writes have succeeded. However, if this callback is provided, the Txn.Commit() method returns as soon as it has checked for any conflicts. The actual writing to the disk happens asynchronously, and the callback is invoked once the writing has finished, or an error has occurred. This can improve the throughput of the application in some cases. But it also means that a transaction is not durable until the callback has been invoked with a nil error value.

Using key/value pairs

To save a key/value pair to a bucket, use the Txn.Set() method:

err := db.Update(func(txn *badger.Txn) error {
  err := txn.Set([]byte("answer"), []byte("42"), 0)
  return err
})

This will set the value of the "answer" key to "42". To retrieve this value, we can use the Txn.Get() method:

err := db.View(func(txn *badger.Txn) error {
  item, err := txn.Get([]byte("answer"))
  if err != nil {
    return err
  }
  val, err := item.Value()
  if err != nil {
    return err
  }
  fmt.Printf("The answer is: %s\n", val)
  return nil
})

Txn.Get() returns ErrKeyNotFound if the value is not found.

Please note that values returned from Get() are only valid while the transaction is open. If you need to use a value outside of the transaction then you must use copy() to copy it to another byte slice.

Use the Txn.Delete() method to delete a key from the bucket.

Iterating over keys

To iterate over keys, we can use an Iterator, which can be obtained using the Txn.NewIterator() method.

err := db.View(func(txn *.Tx) error {
  opts := DefaultIteratorOptions
  opts.PrefetchSize = 10
  it := txn.NewIterator(&opts)
  for it.Rewind(); it.Valid(); it.Next() {
    item := it.Item()
    k := item.Key()
    v, err := item.Value()
    if err != nil {
      return err
    }
    fmt.Printf("key=%s, value=%s\n", k, v)
  }
  return nil
})

The iterator allows you to move to a specific point in the list of keys and move forward or backward through the keys one at a time.

By default, Badger prefetches the values of the next 100 items. You can adjust that with the IteratorOptions.PrefetchSize field. However, setting it to a value higher than GOMAXPROCS (which we recommend to be 128 or higher) shouldn’t give any additional benefits. You can also turn off the fetching of values altogether. See section below on key-only iteration.

Prefix scans

To iterate over a key prefix, you can combine Seek() and ValidForPrefix():

db.View(func(txn *badger.Txn) error {
  it := txn.NewIterator(&DefaultIteratorOptions)
  prefix := []byte("1234")
  for it.Seek(prefix); it.ValidForPrefix(prefix); it.Next() {
    item := it.Item()
    k := item.Key()
    v, err := item.Value()
    if err != nil {
      return err
    }
    fmt.Printf("key=%s, value=%s\n", k, v)
  }
  return nil
})

Key-only iteration

Badger supports a unique mode of iteration called key-only iteration. It is several order of magnitudes faster than regular iteration, because it involves access to the LSM-tree only, which is usually resident entirely in RAM. To enable key-only iteration, you need to set the IteratorOptions.PrefetchValues field to false. This can also be used to do sparse reads for selected keys during an iteration, by calling item.Value() only when required.

err := db.View(func(txn *.Tx) error {
  opts := DefaultIteratorOptions
  opts.PrefetchValues = false
  it := txn.NewIterator(&opts)
  for it.Rewind(); it.Valid(); it.Next() {
    item := it.Item()
    k := item.Key()
    fmt.Printf("key=%s\n", k)
  }
  return nil
})

Database backup

Database backup is an open issue for v1.0 and will be coming soon.

Statistics

Badger records metrics using the expvar package, which is included in the Go standard library. All the metrics are documented in y/metrics.go file.

expvar package adds a handler in to the default HTTP server (which has to be started explicitly), and serves up the metrics at the /debug/vars endpoint. These metrics can then be collected by a system like Prometheus, to get better visibility into what Badger is doing.

Resources

Blog Posts

  1. Introducing Badger: A fast key-value store written natively in Go
  2. Make Badger crash resilient with ALICE
  3. Badger vs LMDB vs BoltDB: Benchmarking key-value databases in Go
  4. Concurrent ACID Transactions in Badger

Design

Badger was written with these design goals in mind:

  • Write a key-value store in pure Go.
  • Use latest research to build the fastest KV store for data sets spanning terabytes.
  • Optimize for SSDs.

Badger’s design is based on a paper titled WiscKey: Separating Keys from Values in SSD-conscious Storage.

Comparisons

Feature Badger RocksDB BoltDB
Design LSM tree with value log LSM tree only B+ tree
High RW Performance Yes Yes No
Designed for SSDs Yes (with latest research 1) Not specifically 2 No
Embeddable Yes Yes Yes
Sorted KV access Yes Yes Yes
Pure Go (no Cgo) Yes No Yes
Transactions Yes, ACID, concurrent with SSI3 Yes (but non-ACID) Yes, ACID
Snapshots Yes Yes Yes

1 The WISCKEY paper (on which Badger is based) saw big wins with separating values from keys, significantly reducing the write amplification compared to a typical LSM tree.

2 RocksDB is an SSD optimized version of LevelDB, which was designed specifically for rotating disks. As such RocksDB's design isn't aimed at SSDs.

3 SSI: Serializable Snapshot Isolation. For more details, see the blog post Concurrent ACID Transactions in Badger

Benchmarks

We have run comprehensive benchmarks against RocksDB, Bolt and LMDB. The benchmarking code, and the detailed logs for the benchmarks can be found in the badger-bench repo. More explanation, including graphs can be found the blog posts (linked above).

Other Projects Using Badger

Below is a list of public, open source projects that use Badger:

  • Dgraph - Distributed graph database
  • go-ipfs - Go client for the InterPlanetary File System (IPFS), a new hypermedia distribution protocol.

If you are using Badger in a project please send a pull request to add it to the list.

Frequently Asked Questions

  • My writes are really slow. Why? Are you creating a new transaction for every single key update? This will lead to very low throughput. To get best write performance, batch up multiple writes inside a transaction using single DB.Update() call. You could also have multiple such DB.Update() calls being made concurrently from multiple goroutines.

  • I don't see any disk write. Why?

If you're using Badger with SyncWrites=false, then your writes might not be written to value log and won't get synced to disk immediately. Writes to LSM tree are done inmemory first, before they get compacted to disk. The compaction would only happen once MaxTableSize has been reached. So, if you're doing a few writes and then checking, you might not see anything on disk. Once you Close the store, you'll see these writes on disk.

  • Which instances should I use for Badger?

We recommend using instances which provide local SSD storage, without any limit on the maximum IOPS. In AWS, these are storage optimized instances like i3. They provide local SSDs which clock 100K IOPS over 4KB blocks easily.

  • Are there any Go specific settings that I should use?

We highly recommend setting a high number for GOMAXPROCS, which allows Go to observe the full IOPS throughput provided by modern SSDs. In Dgraph, we have set it to 128. For more details, see this thread.

Contact

Packages

No packages published

Languages

  • Go 99.8%
  • Shell 0.2%