Skip to content

Commit

Permalink
Merge branch 'skyzh:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
Trojanking123 authored Mar 19, 2024
2 parents a8e06f3 + afad25b commit a85c573
Show file tree
Hide file tree
Showing 12 changed files with 130 additions and 18 deletions.
92 changes: 92 additions & 0 deletions mini-lsm-book/src/lsm-tutorial/week2-00-triangle.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion mini-lsm-book/src/week2-01-compaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ There are some things that you might need to think about.
In this task, you will need to modify,

```
src/iterators/concat.rs
src/iterators/concat_iterator.rs
```

Now that you have created sorted runs in your system, it is possible to do a simple optimization over the read path. You do not always need to create merge iterators for your SSTs. If SSTs belong to one sorted run, you can create a concat iterator that simply iterates the keys in each SST in order, because SSTs in one sorted run do not contain overlapping key ranges and they are sorted by their first key. We do not want to create all SST iterators in advance (because it will lead to one block read), and therefore we only store SST objects in this iterator.
Expand Down
10 changes: 5 additions & 5 deletions mini-lsm-book/src/week2-02-simple.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,11 @@ src/compact/simple_leveled.rs

Simple leveled compaction is similar the original LSM paper's compaction strategy. It maintains a number of levels for the LSM tree. When a level (>= L1) is too large, it will merge all of this level's SSTs with next level. The compaction strategy is controlled by 3 parameters as defined in `SimpleLeveledCompactionOptions`.

* `size_ratio_percent`: lower level number of files / upper level number of files. In reality, we should compute the actual size of the files. However, we simplified the equation to use number of files to make it easier to do the simulation. When the ratio is too high (upper level has too many files), we should trigger a compaction.
* `size_ratio_percent`: lower level number of files / upper level number of files. In reality, we should compute the actual size of the files. However, we simplified the equation to use number of files to make it easier to do the simulation. When the ratio is too low (upper level has too many files), we should trigger a compaction.
* `level0_file_num_compaction_trigger`: when the number of SSTs in L0 is larger than or equal to this number, trigger a compaction of L0 and L1.
* `max_levels`: the number of levels (excluding L0) in the LSM tree.

Assume size_ratio_percent=200, max_levels=3, level0_file_num_compaction_trigger=2, let us take a look at the below example.
Assume size_ratio_percent=200 (Lower level should have 2x number of files as the upper level), max_levels=3, level0_file_num_compaction_trigger=2, let us take a look at the below example.

Assume the engine flushes two L0 SSTs. This reaches the `level0_file_num_compaction_trigger`, and your controller should trigger an L0->L1 compaction.

Expand All @@ -51,7 +51,7 @@ L2 (0): []
L3 (0): []
```

Now, L2 is empty while L1 has two files. The size ratio for L1 and L2 is `L2/L1=0/2=0 < size_ratio`. Therefore, we will trigger a L1+L2 compaction to push the data lower to L2. The same applies to L2 and these two SSTs will be placed at the bottom-most level after 2 compactions.
Now, L2 is empty while L1 has two files. The size ratio percent for L1 and L2 is `(L2/L1) * 100 = (0/2) * 100 = 0 < size_ratio_percent (200)`. Therefore, we will trigger a L1+L2 compaction to push the data lower to L2. The same applies to L2 and these two SSTs will be placed at the bottom-most level after 2 compactions.

```
--- After Compaction ---
Expand All @@ -75,7 +75,7 @@ L2 (2): [13, 14]
L3 (2): [7, 8]
```

At this point, `L3/L2=1 < size_ratio`. Therefore, we need to trigger a compaction between L2 and L3.
At this point, `L3/L2= (1 / 1) * 100 = 100 < size_ratio_percent (200)`. Therefore, we need to trigger a compaction between L2 and L3.

```
--- After Compaction ---
Expand All @@ -100,7 +100,7 @@ L2 (2): [23, 24]
L3 (4): [15, 16, 17, 18]
```

Because `L3/L2 = 2 >= size_ratio`, we do not need to merge L2 and L3 and will end up with the above state. Simple leveled compaction strategy always compact a full level, and keep a fanout size between levels, so that the lower level is always some multiplier times larger than the upper level.
Because `L3/L2 = (4 / 2) * 100 = 200 >= size_ratio_percent (200)`, we do not need to merge L2 and L3 and will end up with the above state. Simple leveled compaction strategy always compact a full level, and keep a fanout size between levels, so that the lower level is always some multiplier times larger than the upper level.

We have already initialized the LSM state to have `max_level` levels. You should first implement `generate_compaction_task` that generates a compaction task based on the above 3 criteria. After that, implement `apply_compaction_result`. We recommend you implement L0 trigger first, run a compaction simulation, and then implement the size ratio trigger, and then run a compaction simulation. To run the compaction simulation,

Expand Down
4 changes: 2 additions & 2 deletions mini-lsm-book/src/week2-03-tiered.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,13 +76,13 @@ L67 (1): [67]
L40 (27): [39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 13, 14, 15, 16, 17, 18, 19, 20, 21]
```

The `num_iters` in the compaction simulator is set to 3. However, there are far more than 3 iters in the LSM state, which incurs large read amplification.
The `num_tiers` in the compaction simulator is set to 3. However, there are far more than 3 tiers in the LSM state, which incurs large read amplification.

The current trigger only reduces space amplification. We will need to add new triggers to the compaction algorithm to reduce read amplification.

### Task 1.2: Triggered by Size Ratio

The next trigger is the size ratio trigger. For all tiers, if there is a tier `n` that `size of all previous tiers / this tier >= (1 + size_ratio) * 100%`, we will compact all `n` tiers. We only do this compaction with there are more than `min_merge_width` tiers to be merged.
The next trigger is the size ratio trigger. For all tiers, if there is a tier `n` that `size of all previous tiers / this tier >= (100 + size_ratio) * 100%`, we will compact all `n` tiers. We only do this compaction with there are more than `min_merge_width` tiers to be merged.

With this trigger, you will observe the following in the compaction simulator:

Expand Down
2 changes: 2 additions & 0 deletions mini-lsm-book/src/week2-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ The ratio of memtables flushed to the disk versus total data written to the disk

A good compaction strategy can balance read amplification, write amplification, and space amplification (we will talk about it soon). In a general-purpose LSM storage engine, it is generally impossible to find a strategy that can achieve the lowest amplification in all 3 of these factors, unless there are some specific data pattern that the engine could use. The good thing about LSM is that we can theoretically analyze the amplifications of a compaction strategy and all these things happen in the background. We can choose compaction strategies and dynamically change some parameters of them to adjust our storage engine to the optimal state. Compaction strategies are all about tradeoffs, and LSM-based storage engine enables us to select what to be traded at runtime.

![compaction tradeoffs](./lsm-tutorial/week2-00-triangle.svg)

One typical workload in the industry is like: the user first batch ingests data into the storage engine, usually gigabytes per second, when they start a product. Then, the system goes live and users start doing small transactions over the system. In the first phase, the engine should be able to quickly ingest data, and therefore we can use a compaction strategy that minimize write amplification to accelerate this process. Then, we adjust the parameters of the compaction algorithm to make it optimized for read amplification, and do a full compaction to reorder existing data, so that the system can run stably when it goes live.

If the workload is like a time-series database, it is possible that the user always populate and truncate data by time. Therefore, even if there is no compaction, these append-only data can still have low amplification on the disk. Therefore, in real life, you should watch for patterns or specific requirements from the users, and use these information to optimize your system.
Expand Down
4 changes: 2 additions & 2 deletions mini-lsm-mvcc/src/compact.rs
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ impl LsmStorageInner {
upper_ssts.push(snapshot.sstables.get(id).unwrap().clone());
}
let upper_iter = SstConcatIterator::create_and_seek_to_first(upper_ssts)?;
let mut lower_ssts = Vec::with_capacity(upper_level_sst_ids.len());
let mut lower_ssts = Vec::with_capacity(lower_level_sst_ids.len());
for id in lower_level_sst_ids.iter() {
lower_ssts.push(snapshot.sstables.get(id).unwrap().clone());
}
Expand All @@ -267,7 +267,7 @@ impl LsmStorageInner {
)?));
}
let upper_iter = MergeIterator::create(upper_iters);
let mut lower_ssts = Vec::with_capacity(upper_level_sst_ids.len());
let mut lower_ssts = Vec::with_capacity(lower_level_sst_ids.len());
for id in lower_level_sst_ids.iter() {
lower_ssts.push(snapshot.sstables.get(id).unwrap().clone());
}
Expand Down
4 changes: 2 additions & 2 deletions mini-lsm-mvcc/src/lsm_storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ pub struct MiniLsm {
pub(crate) inner: Arc<LsmStorageInner>,
/// Notifies the L0 flush thread to stop working. (In week 1 day 6)
flush_notifier: crossbeam_channel::Sender<()>,
/// The handle for the compaction thread. (In week 1 day 6)
/// The handle for the flush thread. (In week 1 day 6)
flush_thread: Mutex<Option<std::thread::JoinHandle<()>>>,
/// Notifies the compaction thread to stop working. (In week 2)
compaction_notifier: crossbeam_channel::Sender<()>,
Expand Down Expand Up @@ -513,7 +513,7 @@ impl LsmStorageInner {
let l0_iter = MergeIterator::create(l0_iters);
let mut level_iters = Vec::with_capacity(snapshot.levels.len());
for (_, level_sst_ids) in &snapshot.levels {
let mut level_ssts = Vec::with_capacity(snapshot.levels[0].1.len());
let mut level_ssts = Vec::with_capacity(level_sst_ids.len());
for table in level_sst_ids {
let table = snapshot.sstables[table].clone();
if keep_table(key, &table) {
Expand Down
5 changes: 4 additions & 1 deletion mini-lsm-starter/src/bin/mini-lsm-cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,10 @@ impl ReplHandler {
self.lsm.force_full_compaction()?;
println!("full compaction success");
}
Command::Quit | Command::Close => std::process::exit(0),
Command::Quit | Command::Close => {
self.lsm.close()?;
std::process::exit(0);
}
};

self.epoch += 1;
Expand Down
2 changes: 1 addition & 1 deletion mini-lsm-starter/src/lsm_storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ pub struct MiniLsm {
pub(crate) inner: Arc<LsmStorageInner>,
/// Notifies the L0 flush thread to stop working. (In week 1 day 6)
flush_notifier: crossbeam_channel::Sender<()>,
/// The handle for the compaction thread. (In week 1 day 6)
/// The handle for the flush thread. (In week 1 day 6)
flush_thread: Mutex<Option<std::thread::JoinHandle<()>>>,
/// Notifies the compaction thread to stop working. (In week 2)
compaction_notifier: crossbeam_channel::Sender<()>,
Expand Down
4 changes: 2 additions & 2 deletions mini-lsm/src/compact.rs
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ impl LsmStorageInner {
upper_ssts.push(snapshot.sstables.get(id).unwrap().clone());
}
let upper_iter = SstConcatIterator::create_and_seek_to_first(upper_ssts)?;
let mut lower_ssts = Vec::with_capacity(upper_level_sst_ids.len());
let mut lower_ssts = Vec::with_capacity(lower_level_sst_ids.len());
for id in lower_level_sst_ids.iter() {
lower_ssts.push(snapshot.sstables.get(id).unwrap().clone());
}
Expand All @@ -221,7 +221,7 @@ impl LsmStorageInner {
)?));
}
let upper_iter = MergeIterator::create(upper_iters);
let mut lower_ssts = Vec::with_capacity(upper_level_sst_ids.len());
let mut lower_ssts = Vec::with_capacity(lower_level_sst_ids.len());
for id in lower_level_sst_ids.iter() {
lower_ssts.push(snapshot.sstables.get(id).unwrap().clone());
}
Expand Down
4 changes: 2 additions & 2 deletions mini-lsm/src/lsm_storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ pub struct MiniLsm {
pub(crate) inner: Arc<LsmStorageInner>,
/// Notifies the L0 flush thread to stop working. (In week 1 day 6)
flush_notifier: crossbeam_channel::Sender<()>,
/// The handle for the compaction thread. (In week 1 day 6)
/// The handle for the flush thread. (In week 1 day 6)
flush_thread: Mutex<Option<std::thread::JoinHandle<()>>>,
/// Notifies the compaction thread to stop working. (In week 2)
compaction_notifier: crossbeam_channel::Sender<()>,
Expand Down Expand Up @@ -503,7 +503,7 @@ impl LsmStorageInner {
let l0_iter = MergeIterator::create(l0_iters);
let mut level_iters = Vec::with_capacity(snapshot.levels.len());
for (_, level_sst_ids) in &snapshot.levels {
let mut level_ssts = Vec::with_capacity(snapshot.levels[0].1.len());
let mut level_ssts = Vec::with_capacity(level_sst_ids.len());
for table in level_sst_ids {
let table = snapshot.sstables[table].clone();
if keep_table(key, &table) {
Expand Down
15 changes: 15 additions & 0 deletions mini-lsm/src/tests/week1_day5.rs
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,21 @@ fn test_task2_storage_scan() {
.unwrap(),
vec![(Bytes::from("2"), Bytes::from("2333"))],
);
check_lsm_iter_result_by_key(
&mut storage
.scan(Bound::Included(b"0"), Bound::Included(b"1"))
.unwrap(),
vec![
(Bytes::from_static(b"0"), Bytes::from_static(b"2333333")),
(Bytes::from("00"), Bytes::from("2333")),
],
);
check_lsm_iter_result_by_key(
&mut storage
.scan(Bound::Excluded(b"0"), Bound::Included(b"1"))
.unwrap(),
vec![(Bytes::from("00"), Bytes::from("2333"))],
);
}

#[test]
Expand Down

0 comments on commit a85c573

Please sign in to comment.