Skip to content

A distributed transactional key-value storage engine in Rust, with horizontal scalability, strong consistency, and high availability.

Notifications You must be signed in to change notification settings

YumingxuanGuo/featherdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FeatherDB

Version: 0.5.0

Introduction

FeatherDB is an disk-based, concurrent, transactional, relational, unreliable, distributed database management system in Rust, written for educational purposes.

What's New: Replica Cluster

In the latest release, FeatherDB has utilized the Raft consensus algorithm to maintain a cluster of replica servers. This is a fundamental step in turning our database from a single-server system into a multi-server, distributed system. This transition provides enhanced consistency and reliability to FeatherDB, ensuring a better and more robust user experience.

Usage

To start the FeatherDB server, run the following command:

$ ./run.sh
|db-a|  FeatherDB server listening on 127.0.0.1:9501...
|kv-a|  FeatherKV server listening on 127.0.0.1:9601...
|kv-b|  FeatherKV server listening on 127.0.0.1:9602...
|kv-c|  FeatherKV server listening on 127.0.0.1:9603...

Now there are three FeatherKV servers running in the background, and one FeatherDB server running in the foreground. Users can connect to the FeatherDB server and execute SQL queries by running the following command in a new shell:

$ cargo run --bin client_db
Connected to featherDB. Enter !help for instructions.
featherDB>

You can now execute SQL queries in the client shell. For example:

featherDB> BEGIN;
  Began transaction 1
featherDB> CREATE TABLE my_table (id INTEGER PRIMARY KEY, name STRING);
  Created table my_table
featherDB> INSERT INTO my_table (id, name) VALUES (1, 'Gman'), (2, 'Alex');
  Created 2 rows
featherDB> SELECT * FROM my_table;
  1|Gman
  2|Alex
featherDB> UPDATE my_table SET name = 'Joe' WHERE id = 2;
  Updated 1 rows
featherDB> SELECT * FROM my_table WHERE id = 2;
  2|Joe
featherDB> DELETE FROM my_table WHERE id = 2;
  Deleted 1 rows
featherDB> SELECT * FROM my_table;
  1|Gman
featherDB> COMMIT;
  Committed transaction 1

Relational Model

FeatherDB has introduced a relational model, supporting for SQL queries. This includes basic CRUD operations as well as more advanced queries such as joins, filters, and transactions. Additionally, we have included a command-line REPL client interface that connects to the FeatherDB server, allowing users to execute SQL queries and view the results.

Relational Model Breakdown

The SQL engine in FeatherDB utilizes MVCC to provide a SQL interface to clients. The lifecycle of a SQL query can be summarized as follows:

Query → Lexer → Parser → Planner → Optimizer → Executor → Storage Engine

Parsing

The SQL session processes plain-text SQL queries and returns the results. The first step in this process involves parsing the query into an abstract syntax tree (AST), which represents the query's semantics.

Planning

The SQL planner takes the AST generated by the parser and constructs a SQL execution plan. This plan is an abstract representation of the steps required to execute the query.

Optimizing

Initially, the planner generates a naïve execution plan that focuses on producing a correct outcome rather than a fast one. This plan is then optimized by a series of optimizers, including ConstantFolder, FilterPushdown, and so on.

Execution

Each SQL plan node has a corresponding executor. Executors are given a transaction to access the SQL storage engine and return a result set containing the query results.

Finally, the result set is returned to the client, completing the process.

Reference: toyDB.

MVCC-Based Transactions

FeatherDB has introduced support for concurrent ACID transactions based on Multi-Version Concurrency Control (MVCC). We offer two isolation levels: Snapshot Isolation (SI) and Serializable Snapshot Isolation (SSI).

Snapshot Isolation (SI)

Snapshot Isolation enables multiple transactions to execute concurrently by creating a distinct snapshot of the database for each transaction. This approach avoids the need for locks and ensures that write operations do not block read operations, reducing contention between transactions and enhancing overall system performance. Furthermore, SI versions all data, allowing queries on historical data. To provide ACID transactions, commits are atomic: a transaction operates on a consistent snapshot of the key/value store from the beginning of the transaction, and any write conflicts lead to serialization errors that require retries.

Reference: toyDB.

Serializable Snapshot Isolation (SSI)

While SI eliminates most inconsistencies, including phantom reads, it is not fully serializable, as it may exhibit write skew anomalies. To address this, we have introduced a separate mechanism to detect transactions with consecutive read-write dependencies, a necessary but insufficient condition for write skews. If a transaction is detected, it is aborted, rolled back, and retried. This approach conservatively eliminates write skews and ensures serializability, although some benign transactions might be aborted. However, we accept the overhead of false positives due to their infrequency.

Reference: Serializable Isolation for Snapshot Databases.

Key-Value Storage

FeatherDB has evolved from a purely in-memory solution to a disk-based, persistent storage system. We've chosen the LSM-tree as the primary key-value storage engine due to its excellent write performance.

We provide five methods as the interface:

/// Sets a value for a key, replacing the existing value if any.
fn set(&self, key: &[u8], value: Vec<u8>) -> Result<()>;

/// Gets a value for a key, if it exists.
fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>>;

/// Deletes a key, doing nothing if it does not exist.
fn delete(&self, key: &[u8]) -> Result<()>;

/// Iterates over an ordered range of key/value pairs.
fn scan(&self, range: Range) -> Result<KvScan>;

/// Flushes any buffered data to the underlying storage medium.
fn flush(&self) -> Result<()>;

The LSM-tree comprises two distinct data structures: a memtable for in-memory data and an SSTable (Sorted String Table) for on-disk data. Once data is flushed to disk as SSTables, it becomes immutable; mutable operations are only performed in the memtables.

We employ a lock-free skiplist as the underlying implementation for the memtable, ensuring thread-safety in concurrent operations. Additionally, all interface methods take immutable references, allowing us to access the storage with a simple Arc<LsmTree> instead of Arc<RwLock<LsmTree>>.

We also offer iterator functionality in the scan method, enabling efficient key-range traversals and laying the foundation for potential future SQL query support.

What's Next

We're constantly working to improve FeatherDB and have several exciting developments planned for the near future and beyond:

  • Separation of SQL and storage: We're working on dividing the project into two layers for a clearer architecture and better modularity: FeatherDB for the SQL layer, and FeatherKV for the storage layer.

  • Advanced SQL Queries: We're working on adding support for more advanced SQL queries, such as aggregations, projections, and limits, to enhance the capabilities of FeatherDB.

  • Advanced SQL Optimizations: We're also focusing on implementing more advanced SQL optimizations, including IndexLookup, NoopCleaner, and JoinType, to improve query performance and efficiency.

  • Data Sharding: To improve scalability, we will introduce sharding to distribute data across multiple nodes, allowing FeatherDB to handle larger datasets.

In the longer term, we have the following enhancements in mind:

  • Write-ahead-log (WAL): To ensure durability and data integrity, we will introduce a WAL mechanism that guarantees all operations will eventually be performed, even after unexpected crashes.

  • Leveled Compaction: As an essential part of the LSM-tree, we will implement level compaction to minimize storage overhead for outdated and deleted entries, optimizing storage utilization.

  • Bloom Filter: To enhance key searching performance, we will incorporate Bloom filters as an optimization technique, reducing unnecessary disk I/O operations.

Stay tuned for these upcoming features and improvements in FeatherDB. Be sure to follow our GitHub repository to stay up-to-date with our latest developments and releases.

About

A distributed transactional key-value storage engine in Rust, with horizontal scalability, strong consistency, and high availability.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published