-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathatom.xml
331 lines (178 loc) · 405 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Yiheng Tong</title>
<link href="https://tong1heng.github.io/atom.xml" rel="self"/>
<link href="https://tong1heng.github.io/"/>
<updated>2023-03-19T11:45:54.496Z</updated>
<id>https://tong1heng.github.io/</id>
<author>
<name>Yiheng Tong</name>
</author>
<generator uri="https://hexo.io/">Hexo</generator>
<entry>
<title>Fast Databases with Fast Durability and Recovery Through Multicore Parallelism</title>
<link href="https://tong1heng.github.io/2022/12/07/RUC/OSDI14-zheng/"/>
<id>https://tong1heng.github.io/2022/12/07/RUC/OSDI14-zheng/</id>
<published>2022-12-07T10:00:00.000Z</published>
<updated>2023-03-19T11:45:54.496Z</updated>
<content type="html"><![CDATA[<p>Starting from an efficient multicore database system, we show that naive logging and checkpoints make normal-case execution slower, but that frequent disk synchronization allows us to keep up with many workloads with only a modest reduction in throughput. We design throughout for parallelism: during logging, during checkpointing, and during recovery. The result is fast.</p><span id="more"></span><center><embed src="/slides/OSDI14-zheng.pdf" width="850" height="600"></center>]]></content>
<summary type="html"><p>Starting from an efficient multicore database system, we show that naive logging and checkpoints make normal-case execution slower, but that frequent disk synchronization allows us to keep up with many workloads with only a modest reduction in throughput. We design throughout for parallelism: during logging, during checkpointing, and during recovery. The result is fast.</p></summary>
<category term="RUC" scheme="https://tong1heng.github.io/categories/RUC/"/>
<category term="paper" scheme="https://tong1heng.github.io/tags/paper/"/>
</entry>
<entry>
<title>The Adaptive Radix Tree:ARTful Indexing for Main-Memory Databases</title>
<link href="https://tong1heng.github.io/2022/11/14/RUC/ICDE13-ART/"/>
<id>https://tong1heng.github.io/2022/11/14/RUC/ICDE13-ART/</id>
<published>2022-11-14T07:04:23.000Z</published>
<updated>2022-11-14T07:11:49.782Z</updated>
<content type="html"><![CDATA[<p>We present ART, an adaptive radix tree (trie) for efficient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while supporting very efficient insertions and deletions as well. At the same time, ART is very space efficient and solves the problem of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and efficient data structures for internal nodes.</p><span id="more"></span><center><embed src="/slides/ICDE13-ART.pdf" width="850" height="600"></center>]]></content>
<summary type="html"><p>We present ART, an adaptive radix tree (trie) for efficient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while supporting very efficient insertions and deletions as well. At the same time, ART is very space efficient and solves the problem of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and efficient data structures for internal nodes.</p></summary>
<category term="RUC" scheme="https://tong1heng.github.io/categories/RUC/"/>
<category term="paper" scheme="https://tong1heng.github.io/tags/paper/"/>
</entry>
<entry>
<title>RUC, I'm Coming!</title>
<link href="https://tong1heng.github.io/2022/09/29/Life/ruc/"/>
<id>https://tong1heng.github.io/2022/09/29/Life/ruc/</id>
<published>2022-09-28T16:01:41.000Z</published>
<updated>2022-11-15T03:28:23.372Z</updated>
<content type="html"><![CDATA[<p>来咯!</p><span id="more"></span>]]></content>
<summary type="html"><p>来咯!</p></summary>
<category term="Life" scheme="https://tong1heng.github.io/categories/Life/"/>
<category term="personal" scheme="https://tong1heng.github.io/tags/personal/"/>
</entry>
<entry>
<title>SwapKV:A Hotness Aware In-memory Key-Value Store for Hybrid Memory Systems</title>
<link href="https://tong1heng.github.io/2022/04/16/Embedded/TKDE21-SwapKV/"/>
<id>https://tong1heng.github.io/2022/04/16/Embedded/TKDE21-SwapKV/</id>
<published>2022-04-16T13:58:20.000Z</published>
<updated>2023-03-19T11:49:45.239Z</updated>
<content type="html"><![CDATA[<p>This paper presents SwapKV, which strives to retain both the advantages of DRAM and PMEM, aiming to achieve both high performance and large capacity simultaneously.</p><span id="more"></span><center><embed src="/slides/TKDE21-SwapKV.pdf" width="850" height="600"></center>]]></content>
<summary type="html"><p>This paper presents SwapKV, which strives to retain both the advantages of DRAM and PMEM, aiming to achieve both high performance and large capacity simultaneously.</p></summary>
<category term="Embedded" scheme="https://tong1heng.github.io/categories/Embedded/"/>
<category term="paper" scheme="https://tong1heng.github.io/tags/paper/"/>
</entry>
<entry>
<title>Enabling Low Tail Latency on Multicore Key-Value Stores</title>
<link href="https://tong1heng.github.io/2022/04/05/Embedded/p1091-lersch/"/>
<id>https://tong1heng.github.io/2022/04/05/Embedded/p1091-lersch/</id>
<published>2022-04-05T13:18:16.000Z</published>
<updated>2022-11-14T07:40:58.775Z</updated>
<content type="html"><![CDATA[<p>We present RStore to enable low and predictable latency (i.e. low tail latency) and efficient use of hardware resources such as CPU, memory and storage through the following design points:</p><ul><li>Asynchronous execution</li><li>Hybrid DRAM+NVM architecture</li><li>Log-structured storage</li><li>User-space networking</li></ul><span id="more"></span><center><embed src="/slides/p1091-lersch.pdf" width="850" height="600"></center>]]></content>
<summary type="html"><p>We present RStore to enable low and predictable latency (i.e. low tail latency) and efficient use of hardware resources such as CPU, memory and storage through the following design points:</p>
<ul>
<li>Asynchronous execution</li>
<li>Hybrid DRAM+NVM architecture</li>
<li>Log-structured storage</li>
<li>User-space networking</li>
</ul></summary>
<category term="Embedded" scheme="https://tong1heng.github.io/categories/Embedded/"/>
<category term="paper" scheme="https://tong1heng.github.io/tags/paper/"/>
</entry>
<entry>
<title>DFS Techniques</title>
<link href="https://tong1heng.github.io/2022/03/29/Embedded/dfs_techniques/"/>
<id>https://tong1heng.github.io/2022/03/29/Embedded/dfs_techniques/</id>
<published>2022-03-29T11:14:02.000Z</published>
<updated>2023-03-19T11:38:16.543Z</updated>
<content type="html"><![CDATA[<p>Introduce two key techniques of distributed file system.</p><span id="more"></span><center><embed src="/slides/dfs_techniques.pdf" width="850" height="600"></center><h2 id="speech"><a class="markdownIt-Anchor" href="#speech"></a> Speech</h2><p>下面我来给大家介绍分布式文件系统中另外两个核心技术。</p><ul><li><p>首先是Scalability&Usability也就是DFS的可扩展性和可用性。</p><ol><li>高可用的metadata<ul><li>DFS的metadata主要包括文件的命名空间、每个文件不同副本的位置、副本的的版本号等等。</li><li>在DFS中,metadata的存储主要有两种方式,一种是集中存储,把所有的元数据都存在一个metadata server,统一管理所有的元数据;另一种是将metadata分布到多个节点进行存储。这两种方式相比,集中存储更常用,将元数据和数据分离,整个系统拥有较高的吞吐量,便于实现。</li><li>以GFS为例,GFS的metadata主要包括三部分内容:<strong>命名空间、文件到chunk的映射关系、chunk的位置。</strong> 元数据存在唯一的GFS master中,文件按chunk进行划分,每个chunk的大小为64MB,每一个chunk会在多个chunk server中保存副本(默认为3个),chunk server将chunk作为Linux file保存在本地磁盘上。</li><li>如何根据metadata进行读操作:<ul><li>应用程序调用GFS client提供的接口,表明要读取的文件名、偏移、长度。</li><li>GFS Client将偏移按照规则翻译成chunk序号,发送给master。</li><li>master将chunk id与chunk的副本位置告诉GFS client。</li><li>GFS client向最近的持有副本的Chunkserver发出读请求,请求中包含chunk id与范围。</li><li>ChunkServer读取相应的文件,然后将文件内容发给GFS client。</li></ul></li></ul></li><li>Namespace delegation<br />DFS的命名空间主要是指DFS对文件目录的统一管理。分布式文件系统中,需要考虑并发在同一个目录下创建文件的情况。为了防止冲突,使用锁机制保证对命名空间的互斥访问。<ul><li>锁分为读锁和写锁,分别对应读操作和写操作。</li><li>e.g.<ul><li>如果对 /d1/d2/…/dn/leaf 进行操作</li><li>需要获得 /d1, /d1/d2, /d1/d2/…/dn 的读锁</li><li>需要 /d1/d2/…/dn/leaf 的读锁或者写锁</li></ul></li><li>通过命名空间锁可以允许在相同目录发生并发的变化。比如多个文件在同一个目录被并发创建,每个创建会申请此目录的读锁和各自文件的写锁,不会导致冲突。目录的读锁可以保护在创建时此目录不会被删除、重命名或者执行快照。对相同文件的创建请求,由于写锁的保护,也只会导致此文件被串行的创建两次。因为命名空间的节点不少,全量分配读写锁有点浪费资源,所以它们都是lazy分配、用完即删。而且锁申请不是随意的,为了防止死锁,一个操作必须按特定的顺序来申请锁:首先按命名空间树的层级排序,在相同层级再按字典序。</li></ul></li><li>可扩展性<br />DFS有很强的可扩展性,需要注意的问题:<ul><li>如何控制不同server之间的负载均衡</li><li>如何保证新加入的节点不会因短期负载压力过大而崩溃</li><li>如何更新元数据</li></ul></li></ol></li><li><p>然后是Fault-tolerance也就是容错性,DFS通过多副本机制保证容错性,副本之间要保证一致性。</p><ol><li>Checkpointing—metadata的崩溃一致性:metadata存在master的内存中,operation log记录重要的元数据变化的历史信息,是metadata的持久化记录,我们将它重复存在多个远程机器上,直到日志记录被flush到本地磁盘以及远程机器之后才会回复客户端。</li><li>Leases—租赁机制:保证数据修改时的一致性。<ul><li>由master指定primary replica和secondary replicas,60s后过期重新指定。</li><li>写操作流程:<ul><li>Client向master请求Chunk的副本信息,以及哪个副本(Replica)是Primary</li><li>master回复client,client缓存这些信息在本地</li><li>client将数据(Data)链式推送到所有副本</li><li>Client通知Primary提交</li><li>primary在自己成功提交后,通知所有Secondary提交</li><li>Secondary向Primary回复提交结果</li><li>primary回复client提交结果</li></ul></li></ul></li><li>Data的一致性:<ul><li>两种状态:consistent和defined,目的是在所有的replicas的执行相同的串行化操作序列保证file region的defined。</li><li>Handshake检测故障停机</li><li>Checksum检测数据可靠性</li><li>Version控制数据一致性</li><li>返回哪个副本给Client</li></ul></li></ol></li></ul>]]></content>
<summary type="html"><p>Introduce two key techniques of distributed file system.</p></summary>
<category term="Embedded" scheme="https://tong1heng.github.io/categories/Embedded/"/>
<category term="distributed" scheme="https://tong1heng.github.io/tags/distributed/"/>
</entry>
<entry>
<title>Less is More:De-amplifying I/Os for Key-value Stores with a Log-assisted LSM-tree</title>
<link href="https://tong1heng.github.io/2022/03/11/Embedded/ICDE21-huang/"/>
<id>https://tong1heng.github.io/2022/03/11/Embedded/ICDE21-huang/</id>
<published>2022-03-11T11:35:56.000Z</published>
<updated>2022-11-14T07:40:52.540Z</updated>
<content type="html"><![CDATA[<p>We present a novel scheme, called Log-assisted LSM-tree (L2SM), which adopts a small-size, multi-level log structure to isolate selected key-value items that have a disruptive effect on the tree structure, accumulates and absorbs the repeated updates in a highly efficient manner, and removes obsolete and deleted key-value items at an early stage.</p><span id="more"></span><center><embed src="/slides/ICDE21-huang.pdf" width="850" height="600"></center>]]></content>
<summary type="html"><p>We present a novel scheme, called Log-assisted LSM-tree (L2SM), which adopts a small-size, multi-level log structure to isolate selected key-value items that have a disruptive effect on the tree structure, accumulates and absorbs the repeated updates in a highly efficient manner, and removes obsolete and deleted key-value items at an early stage.</p></summary>
<category term="Embedded" scheme="https://tong1heng.github.io/categories/Embedded/"/>
<category term="paper" scheme="https://tong1heng.github.io/tags/paper/"/>
</entry>
<entry>
<title>Snappy Algorithm</title>
<link href="https://tong1heng.github.io/2021/10/18/Embedded/snappy/"/>
<id>https://tong1heng.github.io/2021/10/18/Embedded/snappy/</id>
<published>2021-10-18T14:18:42.000Z</published>
<updated>2022-11-14T07:41:05.353Z</updated>
<content type="html"><![CDATA[<p>How to analyse the source codes of snappy ? In this blog, the pseudocode of compression and uncompression using snappy is given, which is aimed to help you understand snappy algorithm.</p><span id="more"></span><h2 id="introduction"><a class="markdownIt-Anchor" href="#introduction"></a> Introduction</h2><p>Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. (For more information, see “Performance”, below.)</p><p>Snappy has the following properties:</p><ul><li>Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code. See “Performance” below.</li><li>Stable: Over the last few years, Snappy has compressed and decompressed petabytes of data in Google’s production environment. The Snappy bitstream format is stable and will not change between versions.</li><li>Robust: The Snappy decompressor is designed not to crash in the face of corrupted or malicious input.</li><li>Free and open source software: Snappy is licensed under a BSD-type license. For more information, see the included COPYING file.</li></ul><p>Snappy has previously been called “Zippy” in some Google presentations and the like.</p><h2 id="snappy-in-rocksdb"><a class="markdownIt-Anchor" href="#snappy-in-rocksdb"></a> Snappy in RocksDB</h2><ul><li><p>How to link: <a href="https://github.com/facebook/rocksdb/blob/main/build_tools/build_detect_platform">https://github.com/facebook/rocksdb/blob/main/build_tools/build_detect_platform</a></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> ! test $ROCKSDB_DISABLE_SNAPPY; then</span><br><span class="line"> # Test whether Snappy library is installed</span><br><span class="line"> <span class="meta"># http:<span class="comment">//code.google.com/p/snappy/</span></span></span><br><span class="line"> $CXX $PLATFORM_CXXFLAGS -x c++ - -o /dev/null <span class="number">2</span>>/dev/null <<EOF</span><br><span class="line"> <span class="meta">#<span class="keyword">include</span> <span class="string"><snappy.h></span></span></span><br><span class="line"> <span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{}</span><br></pre></td></tr></table></figure></li><li><p>Where to use: <a href="https://github.com/facebook/rocksdb/blob/main/util/compression.h">https://github.com/facebook/rocksdb/blob/main/util/compression.h</a></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">inline</span> <span class="type">bool</span> <span class="title">Snappy_Compress</span><span class="params">(<span class="type">const</span> CompressionInfo& <span class="comment">/*info*/</span>, <span class="type">const</span> <span class="type">char</span>* input,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">size_t</span> length, ::std::string* output)</span> </span>{</span><br><span class="line"><span class="meta">#<span class="keyword">ifdef</span> SNAPPY</span></span><br><span class="line"> output-><span class="built_in">resize</span>(snappy::<span class="built_in">MaxCompressedLength</span>(length));</span><br><span class="line"> <span class="type">size_t</span> outlen;</span><br><span class="line"> snappy::<span class="built_in">RawCompress</span>(input, length, &(*output)[<span class="number">0</span>], &outlen);</span><br><span class="line"> output-><span class="built_in">resize</span>(outlen);</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line"><span class="meta">#<span class="keyword">else</span></span></span><br><span class="line"> (<span class="type">void</span>)input;</span><br><span class="line"> (<span class="type">void</span>)length;</span><br><span class="line"> (<span class="type">void</span>)output;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line"><span class="meta">#<span class="keyword">endif</span></span></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">inline</span> CacheAllocationPtr <span class="title">Snappy_Uncompress</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">const</span> <span class="type">char</span>* input, <span class="type">size_t</span> length, <span class="type">size_t</span>* uncompressed_size,</span></span></span><br><span class="line"><span class="params"><span class="function"> MemoryAllocator* allocator = <span class="literal">nullptr</span>)</span> </span>{</span><br><span class="line"><span class="meta">#<span class="keyword">ifdef</span> SNAPPY</span></span><br><span class="line"> <span class="type">size_t</span> uncompressed_length = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">if</span> (!snappy::<span class="built_in">GetUncompressedLength</span>(input, length, &uncompressed_length)) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nullptr</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> CacheAllocationPtr output = <span class="built_in">AllocateBlock</span>(uncompressed_length, allocator);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (!snappy::<span class="built_in">RawUncompress</span>(input, length, output.<span class="built_in">get</span>())) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nullptr</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> *uncompressed_size = uncompressed_length;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> output;</span><br><span class="line"><span class="meta">#<span class="keyword">else</span></span></span><br><span class="line"> (<span class="type">void</span>)input;</span><br><span class="line"> (<span class="type">void</span>)length;</span><br><span class="line"> (<span class="type">void</span>)uncompressed_size;</span><br><span class="line"> (<span class="type">void</span>)allocator;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nullptr</span>;</span><br><span class="line"><span class="meta">#<span class="keyword">endif</span></span></span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>RocksDB主要调用了两个接口<code>RawCompress</code>和<code>RawUncompress</code>。</p></li></ul><h2 id="snappy"><a class="markdownIt-Anchor" href="#snappy"></a> Snappy</h2><p>Source: <a href="https://github.com/google/snappy/">https://github.com/google/snappy/</a></p><p>首先看一下Format,然后分别从RawCompress和RawUncompress入手分析Snappy的压缩和解压过程。</p><h3 id="format"><a class="markdownIt-Anchor" href="#format"></a> Format</h3><p><code>format_description.txt</code>说明了一些编码格式。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br></pre></td><td class="code"><pre><span class="line">Snappy compressed format description</span><br><span class="line">Last revised: 2011-10-05</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">This is not a formal specification, but should suffice to explain most</span><br><span class="line">relevant parts of how the Snappy format works. It is originally based on</span><br><span class="line">text by Zeev Tarantov.</span><br><span class="line"></span><br><span class="line">Snappy is a LZ77-type compressor with a fixed, byte-oriented encoding.</span><br><span class="line">There is no entropy encoder backend nor framing layer -- the latter is</span><br><span class="line">assumed to be handled by other parts of the system.</span><br><span class="line"></span><br><span class="line">This document only describes the format, not how the Snappy compressor nor</span><br><span class="line">decompressor actually works. The correctness of the decompressor should not</span><br><span class="line">depend on implementation details of the compressor, and vice versa.</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">1. Preamble</span><br><span class="line"></span><br><span class="line">The stream starts with the uncompressed length (up to a maximum of 2^32 - 1),</span><br><span class="line">stored as a little-endian varint. Varints consist of a series of bytes,</span><br><span class="line">where the lower 7 bits are data and the upper bit is set iff there are</span><br><span class="line">more bytes to be read. In other words, an uncompressed length of 64 would</span><br><span class="line">be stored as 0x40, and an uncompressed length of 2097150 (0x1FFFFE)</span><br><span class="line">would be stored as 0xFE 0xFF 0x7F.</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">2. The compressed stream itself</span><br><span class="line"></span><br><span class="line">There are two types of elements in a Snappy stream: Literals and</span><br><span class="line">copies (backreferences). There is no restriction on the order of elements,</span><br><span class="line">except that the stream naturally cannot start with a copy. (Having</span><br><span class="line">two literals in a row is never optimal from a compression point of</span><br><span class="line">view, but nevertheless fully permitted.) Each element starts with a tag byte,</span><br><span class="line">and the lower two bits of this tag byte signal what type of element will</span><br><span class="line">follow:</span><br><span class="line"></span><br><span class="line"> 00: Literal</span><br><span class="line"> 01: Copy with 1-byte offset</span><br><span class="line"> 10: Copy with 2-byte offset</span><br><span class="line"> 11: Copy with 4-byte offset</span><br><span class="line"></span><br><span class="line">The interpretation of the upper six bits are element-dependent.</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">2.1. Literals (00)</span><br><span class="line"></span><br><span class="line">Literals are uncompressed data stored directly in the byte stream.</span><br><span class="line">The literal length is stored differently depending on the length</span><br><span class="line">of the literal:</span><br><span class="line"></span><br><span class="line"> - For literals up to and including 60 bytes in length, the upper</span><br><span class="line"> six bits of the tag byte contain (len-1). The literal follows</span><br><span class="line"> immediately thereafter in the bytestream.</span><br><span class="line"> - For longer literals, the (len-1) value is stored after the tag byte,</span><br><span class="line"> little-endian. The upper six bits of the tag byte describe how</span><br><span class="line"> many bytes are used for the length; 60, 61, 62 or 63 for</span><br><span class="line"> 1-4 bytes, respectively. The literal itself follows after the</span><br><span class="line"> length.</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">2.2. Copies</span><br><span class="line"></span><br><span class="line">Copies are references back into previous decompressed data, telling</span><br><span class="line">the decompressor to reuse data it has previously decoded.</span><br><span class="line">They encode two values: The _offset_, saying how many bytes back</span><br><span class="line">from the current position to read, and the _length_, how many bytes</span><br><span class="line">to copy. Offsets of zero can be encoded, but are not legal;</span><br><span class="line">similarly, it is possible to encode backreferences that would</span><br><span class="line">go past the end of the block (offset > current decompressed position),</span><br><span class="line">which is also nonsensical and thus not allowed.</span><br><span class="line"></span><br><span class="line">As in most LZ77-based compressors, the length can be larger than the offset,</span><br><span class="line">yielding a form of run-length encoding (RLE). For instance,</span><br><span class="line">"xababab" could be encoded as</span><br><span class="line"></span><br><span class="line"> <literal: "xab"> <copy: offset=2 length=4></span><br><span class="line"></span><br><span class="line">Note that since the current Snappy compressor works in 32 kB</span><br><span class="line">blocks and does not do matching across blocks, it will never produce</span><br><span class="line">a bitstream with offsets larger than about 32768. However, the</span><br><span class="line">decompressor should not rely on this, as it may change in the future.</span><br><span class="line"></span><br><span class="line">There are several different kinds of copy elements, depending on</span><br><span class="line">the amount of bytes to be copied (length), and how far back the</span><br><span class="line">data to be copied is (offset).</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">2.2.1. Copy with 1-byte offset (01)</span><br><span class="line"></span><br><span class="line">These elements can encode lengths between [4..11] bytes and offsets</span><br><span class="line">between [0..2047] bytes. (len-4) occupies three bits and is stored</span><br><span class="line">in bits [2..4] of the tag byte. The offset occupies 11 bits, of which the</span><br><span class="line">upper three are stored in the upper three bits ([5..7]) of the tag byte,</span><br><span class="line">and the lower eight are stored in a byte following the tag byte.</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">2.2.2. Copy with 2-byte offset (10)</span><br><span class="line"></span><br><span class="line">These elements can encode lengths between [1..64] and offsets from</span><br><span class="line">[0..65535]. (len-1) occupies six bits and is stored in the upper</span><br><span class="line">six bits ([2..7]) of the tag byte. The offset is stored as a</span><br><span class="line">little-endian 16-bit integer in the two bytes following the tag byte.</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">2.2.3. Copy with 4-byte offset (11)</span><br><span class="line"></span><br><span class="line">These are like the copies with 2-byte offsets (see previous subsection),</span><br><span class="line">except that the offset is stored as a 32-bit integer instead of a</span><br><span class="line">16-bit integer (and thus will occupy four bytes).</span><br></pre></td></tr></table></figure><h3 id="compress"><a class="markdownIt-Anchor" href="#compress"></a> Compress</h3><ul><li><p><code>RawCompress</code></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">RawCompress</span><span class="params">(<span class="type">const</span> <span class="type">char</span>* input, <span class="type">size_t</span> input_length, <span class="type">char</span>* compressed,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">size_t</span>* compressed_length)</span> </span>{</span><br><span class="line"> <span class="function">ByteArraySource <span class="title">reader</span><span class="params">(input, input_length)</span></span>;</span><br><span class="line"> <span class="function">UncheckedByteArraySink <span class="title">writer</span><span class="params">(compressed)</span></span>;</span><br><span class="line"> <span class="built_in">Compress</span>(&reader, &writer);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Compute how many bytes were added</span></span><br><span class="line"> *compressed_length = (writer.<span class="built_in">CurrentDestination</span>() - compressed);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>首先根据参数创建<code>reader</code>,<code>writer</code>,然后调用<code>Compress</code>进行压缩,最后计算<code>compressed_length</code>。</p><p>下面看一下<code>reader</code>和<code>writer</code>的结构。</p></li><li><p><code>ByteArraySource</code></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// A Source is an interface that yields a sequence of bytes</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Source</span> {</span><br><span class="line"> <span class="keyword">public</span>:</span><br><span class="line"> <span class="built_in">Source</span>() { }</span><br><span class="line"> <span class="keyword">virtual</span> ~<span class="built_in">Source</span>();</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Return the number of bytes left to read from the source</span></span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">size_t</span> <span class="title">Available</span><span class="params">()</span> <span class="type">const</span> </span>= <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Peek at the next flat region of the source. Does not reposition</span></span><br><span class="line"> <span class="comment">// the source. The returned region is empty iff Available()==0.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// Returns a pointer to the beginning of the region and store its</span></span><br><span class="line"> <span class="comment">// length in *len.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// The returned region is valid until the next call to Skip() or</span></span><br><span class="line"> <span class="comment">// until this object is destroyed, whichever occurs first.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// The returned region may be larger than Available() (for example</span></span><br><span class="line"> <span class="comment">// if this ByteSource is a view on a substring of a larger source).</span></span><br><span class="line"> <span class="comment">// The caller is responsible for ensuring that it only reads the</span></span><br><span class="line"> <span class="comment">// Available() bytes.</span></span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">const</span> <span class="type">char</span>* <span class="title">Peek</span><span class="params">(<span class="type">size_t</span>* len)</span> </span>= <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Skip the next n bytes. Invalidates any buffer returned by</span></span><br><span class="line"> <span class="comment">// a previous call to Peek().</span></span><br><span class="line"> <span class="comment">// REQUIRES: Available() >= n</span></span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">void</span> <span class="title">Skip</span><span class="params">(<span class="type">size_t</span> n)</span> </span>= <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">private</span>:</span><br><span class="line"> <span class="comment">// No copying</span></span><br><span class="line"> <span class="built_in">Source</span>(<span class="type">const</span> Source&);</span><br><span class="line"> <span class="type">void</span> <span class="keyword">operator</span>=(<span class="type">const</span> Source&);</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="comment">// A Source implementation that yields the contents of a flat array</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">ByteArraySource</span> : <span class="keyword">public</span> Source {</span><br><span class="line"> <span class="keyword">public</span>:</span><br><span class="line"> <span class="built_in">ByteArraySource</span>(<span class="type">const</span> <span class="type">char</span>* p, <span class="type">size_t</span> n) : <span class="built_in">ptr_</span>(p), <span class="built_in">left_</span>(n) { }</span><br><span class="line"> ~<span class="built_in">ByteArraySource</span>() <span class="keyword">override</span>;</span><br><span class="line"> <span class="function"><span class="type">size_t</span> <span class="title">Available</span><span class="params">()</span> <span class="type">const</span> <span class="keyword">override</span></span>;</span><br><span class="line"> <span class="function"><span class="type">const</span> <span class="type">char</span>* <span class="title">Peek</span><span class="params">(<span class="type">size_t</span>* len)</span> <span class="keyword">override</span></span>;</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Skip</span><span class="params">(<span class="type">size_t</span> n)</span> <span class="keyword">override</span></span>;</span><br><span class="line"> <span class="keyword">private</span>:</span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* ptr_;</span><br><span class="line"> <span class="type">size_t</span> left_;</span><br><span class="line">};</span><br></pre></td></tr></table></figure><ul><li><code>Available</code>: 表示还有多少个字节剩余。</li><li><code>Peek</code>: 返回前面可以窥探到的字节流,并且返回长度。返回的buffer必须持续有效直到<code>Skip</code>。</li><li><code>Skip</code>: 告诉Source某个部分的字节流已经不需要被使用了,将这一部分跳过。</li></ul></li><li><p><code>UncheckedByteArraySink</code></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// A Sink is an interface that consumes a sequence of bytes.</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Sink</span> {</span><br><span class="line"> <span class="keyword">public</span>:</span><br><span class="line"> <span class="built_in">Sink</span>() { }</span><br><span class="line"> <span class="keyword">virtual</span> ~<span class="built_in">Sink</span>();</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Append "bytes[0,n-1]" to this.</span></span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">void</span> <span class="title">Append</span><span class="params">(<span class="type">const</span> <span class="type">char</span>* bytes, <span class="type">size_t</span> n)</span> </span>= <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Returns a writable buffer of the specified length for appending.</span></span><br><span class="line"> <span class="comment">// May return a pointer to the caller-owned scratch buffer which</span></span><br><span class="line"> <span class="comment">// must have at least the indicated length. The returned buffer is</span></span><br><span class="line"> <span class="comment">// only valid until the next operation on this Sink.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// After writing at most "length" bytes, call Append() with the</span></span><br><span class="line"> <span class="comment">// pointer returned from this function and the number of bytes</span></span><br><span class="line"> <span class="comment">// written. Many Append() implementations will avoid copying</span></span><br><span class="line"> <span class="comment">// bytes if this function returned an internal buffer.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// If a non-scratch buffer is returned, the caller may only pass a</span></span><br><span class="line"> <span class="comment">// prefix of it to Append(). That is, it is not correct to pass an</span></span><br><span class="line"> <span class="comment">// interior pointer of the returned array to Append().</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// The default implementation always returns the scratch buffer.</span></span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">char</span>* <span class="title">GetAppendBuffer</span><span class="params">(<span class="type">size_t</span> length, <span class="type">char</span>* scratch)</span></span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// For higher performance, Sink implementations can provide custom</span></span><br><span class="line"> <span class="comment">// AppendAndTakeOwnership() and GetAppendBufferVariable() methods.</span></span><br><span class="line"> <span class="comment">// These methods can reduce the number of copies done during</span></span><br><span class="line"> <span class="comment">// compression/decompression.</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// Append "bytes[0,n-1] to the sink. Takes ownership of "bytes"</span></span><br><span class="line"> <span class="comment">// and calls the deleter function as (*deleter)(deleter_arg, bytes, n)</span></span><br><span class="line"> <span class="comment">// to free the buffer. deleter function must be non NULL.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// The default implementation just calls Append and frees "bytes".</span></span><br><span class="line"> <span class="comment">// Other implementations may avoid a copy while appending the buffer.</span></span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">void</span> <span class="title">AppendAndTakeOwnership</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">char</span>* bytes, <span class="type">size_t</span> n, <span class="type">void</span> (*deleter)(<span class="type">void</span>*, <span class="type">const</span> <span class="type">char</span>*, <span class="type">size_t</span>),</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">void</span> *deleter_arg)</span></span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Returns a writable buffer for appending and writes the buffer's capacity to</span></span><br><span class="line"> <span class="comment">// *allocated_size. Guarantees *allocated_size >= min_size.</span></span><br><span class="line"> <span class="comment">// May return a pointer to the caller-owned scratch buffer which must have</span></span><br><span class="line"> <span class="comment">// scratch_size >= min_size.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// The returned buffer is only valid until the next operation</span></span><br><span class="line"> <span class="comment">// on this ByteSink.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// After writing at most *allocated_size bytes, call Append() with the</span></span><br><span class="line"> <span class="comment">// pointer returned from this function and the number of bytes written.</span></span><br><span class="line"> <span class="comment">// Many Append() implementations will avoid copying bytes if this function</span></span><br><span class="line"> <span class="comment">// returned an internal buffer.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// If the sink implementation allocates or reallocates an internal buffer,</span></span><br><span class="line"> <span class="comment">// it should use the desired_size_hint if appropriate. If a caller cannot</span></span><br><span class="line"> <span class="comment">// provide a reasonable guess at the desired capacity, it should set</span></span><br><span class="line"> <span class="comment">// desired_size_hint = 0.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// If a non-scratch buffer is returned, the caller may only pass</span></span><br><span class="line"> <span class="comment">// a prefix to it to Append(). That is, it is not correct to pass an</span></span><br><span class="line"> <span class="comment">// interior pointer to Append().</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// The default implementation always returns the scratch buffer.</span></span><br><span class="line"> <span class="function"><span class="keyword">virtual</span> <span class="type">char</span>* <span class="title">GetAppendBufferVariable</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">size_t</span> min_size, <span class="type">size_t</span> desired_size_hint, <span class="type">char</span>* scratch,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">size_t</span> scratch_size, <span class="type">size_t</span>* allocated_size)</span></span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">private</span>:</span><br><span class="line"> <span class="comment">// No copying</span></span><br><span class="line"> <span class="built_in">Sink</span>(<span class="type">const</span> Sink&);</span><br><span class="line"> <span class="type">void</span> <span class="keyword">operator</span>=(<span class="type">const</span> Sink&);</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="comment">// A Sink implementation that writes to a flat array without any bound checks.</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">UncheckedByteArraySink</span> : <span class="keyword">public</span> Sink {</span><br><span class="line"> <span class="keyword">public</span>:</span><br><span class="line"> <span class="function"><span class="keyword">explicit</span> <span class="title">UncheckedByteArraySink</span><span class="params">(<span class="type">char</span>* dest)</span> : dest_(dest) {</span> }</span><br><span class="line"> ~<span class="built_in">UncheckedByteArraySink</span>() <span class="keyword">override</span>;</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Append</span><span class="params">(<span class="type">const</span> <span class="type">char</span>* data, <span class="type">size_t</span> n)</span> <span class="keyword">override</span></span>;</span><br><span class="line"> <span class="function"><span class="type">char</span>* <span class="title">GetAppendBuffer</span><span class="params">(<span class="type">size_t</span> len, <span class="type">char</span>* scratch)</span> <span class="keyword">override</span></span>;</span><br><span class="line"> <span class="function"><span class="type">char</span>* <span class="title">GetAppendBufferVariable</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">size_t</span> min_size, <span class="type">size_t</span> desired_size_hint, <span class="type">char</span>* scratch,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">size_t</span> scratch_size, <span class="type">size_t</span>* allocated_size)</span> <span class="keyword">override</span></span>;</span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">AppendAndTakeOwnership</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">char</span>* bytes, <span class="type">size_t</span> n, <span class="type">void</span> (*deleter)(<span class="type">void</span>*, <span class="type">const</span> <span class="type">char</span>*, <span class="type">size_t</span>),</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">void</span> *deleter_arg)</span> <span class="keyword">override</span></span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Return the current output pointer so that a caller can see how</span></span><br><span class="line"> <span class="comment">// many bytes were produced.</span></span><br><span class="line"> <span class="comment">// Note: this is not a Sink method.</span></span><br><span class="line"> <span class="function"><span class="type">char</span>* <span class="title">CurrentDestination</span><span class="params">()</span> <span class="type">const</span> </span>{ <span class="keyword">return</span> dest_; }</span><br><span class="line"> <span class="keyword">private</span>:</span><br><span class="line"> <span class="type">char</span>* dest_;</span><br><span class="line">};</span><br></pre></td></tr></table></figure><ul><li><code>Append</code>: 将bytes[0,n-1]这个字节流写入。</li><li><code>getAppendBuffer</code>: 交出一块length的buffer,这块length的buffer的话必须一直有效直到<code>Append</code>被调用。当然我们也可以直接返回scratch(外围框架分配的内存)。</li></ul></li><li><p><code>Compress</code></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">size_t</span> <span class="title">Compress</span><span class="params">(Source* reader, Sink* writer)</span> </span>{</span><br><span class="line"> <span class="type">size_t</span> written = <span class="number">0</span>;</span><br><span class="line"> <span class="type">size_t</span> N = reader-><span class="built_in">Available</span>();</span><br><span class="line"> <span class="type">const</span> <span class="type">size_t</span> uncompressed_size = N;</span><br><span class="line"> <span class="type">char</span> ulength[Varint::kMax32];</span><br><span class="line"> <span class="type">char</span>* p = Varint::<span class="built_in">Encode32</span>(ulength, N);</span><br><span class="line"> writer-><span class="built_in">Append</span>(ulength, p - ulength);</span><br><span class="line"> written += (p - ulength);</span><br><span class="line"></span><br><span class="line"> <span class="function">internal::WorkingMemory <span class="title">wmem</span><span class="params">(N)</span></span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">while</span> (N > <span class="number">0</span>) {</span><br><span class="line"> <span class="comment">// Get next block to compress (without copying if possible)</span></span><br><span class="line"> <span class="type">size_t</span> fragment_size;</span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* fragment = reader-><span class="built_in">Peek</span>(&fragment_size);</span><br><span class="line"> <span class="built_in">assert</span>(fragment_size != <span class="number">0</span>); <span class="comment">// premature end of input</span></span><br><span class="line"> <span class="type">const</span> <span class="type">size_t</span> num_to_read = std::<span class="built_in">min</span>(N, kBlockSize);</span><br><span class="line"> <span class="type">size_t</span> bytes_read = fragment_size;</span><br><span class="line"></span><br><span class="line"> <span class="type">size_t</span> pending_advance = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">if</span> (bytes_read >= num_to_read) {</span><br><span class="line"> <span class="comment">// Buffer returned by reader is large enough</span></span><br><span class="line"> pending_advance = num_to_read;</span><br><span class="line"> fragment_size = num_to_read;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="type">char</span>* scratch = wmem.<span class="built_in">GetScratchInput</span>();</span><br><span class="line"> std::<span class="built_in">memcpy</span>(scratch, fragment, bytes_read);</span><br><span class="line"> reader-><span class="built_in">Skip</span>(bytes_read);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">while</span> (bytes_read < num_to_read) {</span><br><span class="line"> fragment = reader-><span class="built_in">Peek</span>(&fragment_size);</span><br><span class="line"> <span class="type">size_t</span> n = std::<span class="built_in">min</span><<span class="type">size_t</span>>(fragment_size, num_to_read - bytes_read);</span><br><span class="line"> std::<span class="built_in">memcpy</span>(scratch + bytes_read, fragment, n);</span><br><span class="line"> bytes_read += n;</span><br><span class="line"> reader-><span class="built_in">Skip</span>(n);</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">assert</span>(bytes_read == num_to_read);</span><br><span class="line"> fragment = scratch;</span><br><span class="line"> fragment_size = num_to_read;</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">assert</span>(fragment_size == num_to_read);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Get encoding table for compression</span></span><br><span class="line"> <span class="type">int</span> table_size;</span><br><span class="line"> <span class="type">uint16_t</span>* table = wmem.<span class="built_in">GetHashTable</span>(num_to_read, &table_size);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Compress input_fragment and append to dest</span></span><br><span class="line"> <span class="type">const</span> <span class="type">int</span> max_output = <span class="built_in">MaxCompressedLength</span>(num_to_read);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Need a scratch buffer for the output, in case the byte sink doesn't</span></span><br><span class="line"> <span class="comment">// have room for us directly.</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// Since we encode kBlockSize regions followed by a region</span></span><br><span class="line"> <span class="comment">// which is <= kBlockSize in length, a previously allocated</span></span><br><span class="line"> <span class="comment">// scratch_output[] region is big enough for this iteration.</span></span><br><span class="line"> <span class="type">char</span>* dest = writer-><span class="built_in">GetAppendBuffer</span>(max_output, wmem.<span class="built_in">GetScratchOutput</span>());</span><br><span class="line"> <span class="type">char</span>* end = internal::<span class="built_in">CompressFragment</span>(fragment, fragment_size, dest, table,</span><br><span class="line"> table_size);</span><br><span class="line"> writer-><span class="built_in">Append</span>(dest, end - dest);</span><br><span class="line"> written += (end - dest);</span><br><span class="line"></span><br><span class="line"> N -= num_to_read;</span><br><span class="line"> reader-><span class="built_in">Skip</span>(pending_advance);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="built_in">Report</span>(<span class="string">"snappy_compress"</span>, written, uncompressed_size);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> written;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li><p>头部是原始串长度,使用变长整数方式<code>Encode</code>来编码。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">inline</span> <span class="type">char</span>* <span class="title">Varint::Encode32</span><span class="params">(<span class="type">char</span>* sptr, <span class="type">uint32_t</span> v)</span> </span>{</span><br><span class="line"> <span class="comment">// Operate on characters as unsigneds</span></span><br><span class="line"> <span class="type">uint8_t</span>* ptr = <span class="built_in">reinterpret_cast</span><<span class="type">uint8_t</span>*>(sptr);</span><br><span class="line"> <span class="type">static</span> <span class="type">const</span> <span class="type">uint8_t</span> B = <span class="number">128</span>;</span><br><span class="line"> <span class="keyword">if</span> (v < (<span class="number">1</span> << <span class="number">7</span>)) {</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>(v);</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (v < (<span class="number">1</span> << <span class="number">14</span>)) {</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>(v | B);</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>(v >> <span class="number">7</span>);</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (v < (<span class="number">1</span> << <span class="number">21</span>)) {</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>(v | B);</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>((v >> <span class="number">7</span>) | B);</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>(v >> <span class="number">14</span>);</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (v < (<span class="number">1</span> << <span class="number">28</span>)) {</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>(v | B);</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>((v >> <span class="number">7</span>) | B);</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>((v >> <span class="number">14</span>) | B);</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>(v >> <span class="number">21</span>);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>(v | B);</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>((v>><span class="number">7</span>) | B);</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>((v>><span class="number">14</span>) | B);</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>((v>><span class="number">21</span>) | B);</span><br><span class="line"> *(ptr++) = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>(v >> <span class="number">28</span>);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">reinterpret_cast</span><<span class="type">char</span>*>(ptr);</span><br><span class="line">}</span><br></pre></td></tr></table></figure></li><li><p>获取<code>fragment</code>和<code>fragmentsize</code>。</p></li><li><p>调用<code>CompressFragment</code>。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br><span class="line">173</span><br><span class="line">174</span><br><span class="line">175</span><br><span class="line">176</span><br><span class="line">177</span><br><span class="line">178</span><br><span class="line">179</span><br><span class="line">180</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Flat array compression that does not emit the "uncompressed length"</span></span><br><span class="line"><span class="comment">// prefix. Compresses "input" string to the "*op" buffer.</span></span><br><span class="line"><span class="comment">//</span></span><br><span class="line"><span class="comment">// REQUIRES: "input" is at most "kBlockSize" bytes long.</span></span><br><span class="line"><span class="comment">// REQUIRES: "op" points to an array of memory that is at least</span></span><br><span class="line"><span class="comment">// "MaxCompressedLength(input.size())" in size.</span></span><br><span class="line"><span class="comment">// REQUIRES: All elements in "table[0..table_size-1]" are initialized to zero.</span></span><br><span class="line"><span class="comment">// REQUIRES: "table_size" is a power of two</span></span><br><span class="line"><span class="comment">//</span></span><br><span class="line"><span class="comment">// Returns an "end" pointer into "op" buffer.</span></span><br><span class="line"><span class="comment">// "end - op" is the compressed size of "input".</span></span><br><span class="line"><span class="keyword">namespace</span> internal {</span><br><span class="line"><span class="function"><span class="type">char</span>* <span class="title">CompressFragment</span><span class="params">(<span class="type">const</span> <span class="type">char</span>* input, <span class="type">size_t</span> input_size, <span class="type">char</span>* op,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">uint16_t</span>* table, <span class="type">const</span> <span class="type">int</span> table_size)</span> </span>{</span><br><span class="line"> <span class="comment">// "ip" is the input pointer, and "op" is the output pointer.</span></span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* ip = input;</span><br><span class="line"> <span class="built_in">assert</span>(input_size <= kBlockSize);</span><br><span class="line"> <span class="built_in">assert</span>((table_size & (table_size - <span class="number">1</span>)) == <span class="number">0</span>); <span class="comment">// table must be power of two</span></span><br><span class="line"> <span class="type">const</span> <span class="type">uint32_t</span> mask = table_size - <span class="number">1</span>;</span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* ip_end = input + input_size;</span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* base_ip = ip;</span><br><span class="line"></span><br><span class="line"> <span class="type">const</span> <span class="type">size_t</span> kInputMarginBytes = <span class="number">15</span>;</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">SNAPPY_PREDICT_TRUE</span>(input_size >= kInputMarginBytes)) {</span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* ip_limit = input + input_size - kInputMarginBytes;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">uint32_t</span> preload = LittleEndian::<span class="built_in">Load32</span>(ip + <span class="number">1</span>);;) {</span><br><span class="line"> <span class="comment">// Bytes in [next_emit, ip) will be emitted as literal bytes. Or</span></span><br><span class="line"> <span class="comment">// [next_emit, ip_end) after the main loop.</span></span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* next_emit = ip++;</span><br><span class="line"> <span class="type">uint64_t</span> data = LittleEndian::<span class="built_in">Load64</span>(ip);</span><br><span class="line"> <span class="comment">// The body of this loop calls EmitLiteral once and then EmitCopy one or</span></span><br><span class="line"> <span class="comment">// more times. (The exception is that when we're close to exhausting</span></span><br><span class="line"> <span class="comment">// the input we goto emit_remainder.)</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// In the first iteration of this loop we're just starting, so</span></span><br><span class="line"> <span class="comment">// there's nothing to copy, so calling EmitLiteral once is</span></span><br><span class="line"> <span class="comment">// necessary. And we only start a new iteration when the</span></span><br><span class="line"> <span class="comment">// current iteration has determined that a call to EmitLiteral will</span></span><br><span class="line"> <span class="comment">// precede the next call to EmitCopy (if any).</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// Step 1: Scan forward in the input looking for a 4-byte-long match.</span></span><br><span class="line"> <span class="comment">// If we get close to exhausting the input then goto emit_remainder.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// Heuristic match skipping: If 32 bytes are scanned with no matches</span></span><br><span class="line"> <span class="comment">// found, start looking only at every other byte. If 32 more bytes are</span></span><br><span class="line"> <span class="comment">// scanned (or skipped), look at every third byte, etc.. When a match is</span></span><br><span class="line"> <span class="comment">// found, immediately go back to looking at every byte. This is a small</span></span><br><span class="line"> <span class="comment">// loss (~5% performance, ~0.1% density) for compressible data due to more</span></span><br><span class="line"> <span class="comment">// bookkeeping, but for non-compressible data (such as JPEG) it's a huge</span></span><br><span class="line"> <span class="comment">// win since the compressor quickly "realizes" the data is incompressible</span></span><br><span class="line"> <span class="comment">// and doesn't bother looking for matches everywhere.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// The "skip" variable keeps track of how many bytes there are since the</span></span><br><span class="line"> <span class="comment">// last match; dividing it by 32 (ie. right-shifting by five) gives the</span></span><br><span class="line"> <span class="comment">// number of bytes to move ahead for each iteration.</span></span><br><span class="line"> <span class="type">uint32_t</span> skip = <span class="number">32</span>;</span><br><span class="line"></span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* candidate;</span><br><span class="line"> <span class="keyword">if</span> (ip_limit - ip >= <span class="number">16</span>) {</span><br><span class="line"> <span class="keyword">auto</span> delta = ip - base_ip;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> j = <span class="number">0</span>; j < <span class="number">4</span>; ++j) {</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> k = <span class="number">0</span>; k < <span class="number">4</span>; ++k) {</span><br><span class="line"> <span class="type">int</span> i = <span class="number">4</span> * j + k;</span><br><span class="line"> <span class="comment">// These for-loops are meant to be unrolled. So we can freely</span></span><br><span class="line"> <span class="comment">// special case the first iteration to use the value already</span></span><br><span class="line"> <span class="comment">// loaded in preload.</span></span><br><span class="line"> <span class="type">uint32_t</span> dword = i == <span class="number">0</span> ? preload : <span class="built_in">static_cast</span><<span class="type">uint32_t</span>>(data);</span><br><span class="line"> <span class="built_in">assert</span>(dword == LittleEndian::<span class="built_in">Load32</span>(ip + i));</span><br><span class="line"> <span class="type">uint32_t</span> hash = <span class="built_in">HashBytes</span>(dword, mask);</span><br><span class="line"> candidate = base_ip + table[hash];</span><br><span class="line"> <span class="built_in">assert</span>(candidate >= base_ip);</span><br><span class="line"> <span class="built_in">assert</span>(candidate < ip + i);</span><br><span class="line"> table[hash] = delta + i;</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">SNAPPY_PREDICT_FALSE</span>(LittleEndian::<span class="built_in">Load32</span>(candidate) == dword)) {</span><br><span class="line"> *op = LITERAL | (i << <span class="number">2</span>);</span><br><span class="line"> <span class="built_in">UnalignedCopy128</span>(next_emit, op + <span class="number">1</span>);</span><br><span class="line"> ip += i;</span><br><span class="line"> op = op + i + <span class="number">2</span>;</span><br><span class="line"> <span class="keyword">goto</span> emit_match;</span><br><span class="line"> }</span><br><span class="line"> data >>= <span class="number">8</span>;</span><br><span class="line"> }</span><br><span class="line"> data = LittleEndian::<span class="built_in">Load64</span>(ip + <span class="number">4</span> * j + <span class="number">4</span>);</span><br><span class="line"> }</span><br><span class="line"> ip += <span class="number">16</span>;</span><br><span class="line"> skip += <span class="number">16</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">while</span> (<span class="literal">true</span>) {</span><br><span class="line"> <span class="built_in">assert</span>(<span class="built_in">static_cast</span><<span class="type">uint32_t</span>>(data) == LittleEndian::<span class="built_in">Load32</span>(ip));</span><br><span class="line"> <span class="type">uint32_t</span> hash = <span class="built_in">HashBytes</span>(data, mask);</span><br><span class="line"> <span class="type">uint32_t</span> bytes_between_hash_lookups = skip >> <span class="number">5</span>;</span><br><span class="line"> skip += bytes_between_hash_lookups;</span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* next_ip = ip + bytes_between_hash_lookups;</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">SNAPPY_PREDICT_FALSE</span>(next_ip > ip_limit)) {</span><br><span class="line"> ip = next_emit;</span><br><span class="line"> <span class="keyword">goto</span> emit_remainder;</span><br><span class="line"> }</span><br><span class="line"> candidate = base_ip + table[hash];</span><br><span class="line"> <span class="built_in">assert</span>(candidate >= base_ip);</span><br><span class="line"> <span class="built_in">assert</span>(candidate < ip);</span><br><span class="line"></span><br><span class="line"> table[hash] = ip - base_ip;</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">SNAPPY_PREDICT_FALSE</span>(<span class="built_in">static_cast</span><<span class="type">uint32_t</span>>(data) ==</span><br><span class="line"> LittleEndian::<span class="built_in">Load32</span>(candidate))) {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> data = LittleEndian::<span class="built_in">Load32</span>(next_ip);</span><br><span class="line"> ip = next_ip;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Step 2: A 4-byte match has been found. We'll later see if more</span></span><br><span class="line"> <span class="comment">// than 4 bytes match. But, prior to the match, input</span></span><br><span class="line"> <span class="comment">// bytes [next_emit, ip) are unmatched. Emit them as "literal bytes."</span></span><br><span class="line"> <span class="built_in">assert</span>(next_emit + <span class="number">16</span> <= ip_end);</span><br><span class="line"> op = <span class="built_in">EmitLiteral</span><<span class="comment">/*allow_fast_path=*/</span><span class="literal">true</span>>(op, next_emit, ip - next_emit);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Step 3: Call EmitCopy, and then see if another EmitCopy could</span></span><br><span class="line"> <span class="comment">// be our next move. Repeat until we find no match for the</span></span><br><span class="line"> <span class="comment">// input immediately after what was consumed by the last EmitCopy call.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// If we exit this loop normally then we need to call EmitLiteral next,</span></span><br><span class="line"> <span class="comment">// though we don't yet know how big the literal will be. We handle that</span></span><br><span class="line"> <span class="comment">// by proceeding to the next iteration of the main loop. We also can exit</span></span><br><span class="line"> <span class="comment">// this loop via goto if we get close to exhausting the input.</span></span><br><span class="line"> emit_match:</span><br><span class="line"> <span class="keyword">do</span> {</span><br><span class="line"> <span class="comment">// We have a 4-byte match at ip, and no need to emit any</span></span><br><span class="line"> <span class="comment">// "literal bytes" prior to ip.</span></span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* base = ip;</span><br><span class="line"> std::pair<<span class="type">size_t</span>, <span class="type">bool</span>> p =</span><br><span class="line"> <span class="built_in">FindMatchLength</span>(candidate + <span class="number">4</span>, ip + <span class="number">4</span>, ip_end, &data);</span><br><span class="line"> <span class="type">size_t</span> matched = <span class="number">4</span> + p.first;</span><br><span class="line"> ip += matched;</span><br><span class="line"> <span class="type">size_t</span> offset = base - candidate;</span><br><span class="line"> <span class="built_in">assert</span>(<span class="number">0</span> == <span class="built_in">memcmp</span>(base, candidate, matched));</span><br><span class="line"> <span class="keyword">if</span> (p.second) {</span><br><span class="line"> op = <span class="built_in">EmitCopy</span><<span class="comment">/*len_less_than_12=*/</span><span class="literal">true</span>>(op, offset, matched);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> op = <span class="built_in">EmitCopy</span><<span class="comment">/*len_less_than_12=*/</span><span class="literal">false</span>>(op, offset, matched);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">SNAPPY_PREDICT_FALSE</span>(ip >= ip_limit)) {</span><br><span class="line"> <span class="keyword">goto</span> emit_remainder;</span><br><span class="line"> }</span><br><span class="line"> <span class="comment">// Expect 5 bytes to match</span></span><br><span class="line"> <span class="built_in">assert</span>((data & <span class="number">0xFFFFFFFFFF</span>) ==</span><br><span class="line"> (LittleEndian::<span class="built_in">Load64</span>(ip) & <span class="number">0xFFFFFFFFFF</span>));</span><br><span class="line"> <span class="comment">// We are now looking for a 4-byte match again. We read</span></span><br><span class="line"> <span class="comment">// table[Hash(ip, shift)] for that. To improve compression,</span></span><br><span class="line"> <span class="comment">// we also update table[Hash(ip - 1, mask)] and table[Hash(ip, mask)].</span></span><br><span class="line"> table[<span class="built_in">HashBytes</span>(LittleEndian::<span class="built_in">Load32</span>(ip - <span class="number">1</span>), mask)] = ip - base_ip - <span class="number">1</span>;</span><br><span class="line"> <span class="type">uint32_t</span> hash = <span class="built_in">HashBytes</span>(data, mask);</span><br><span class="line"> candidate = base_ip + table[hash];</span><br><span class="line"> table[hash] = ip - base_ip;</span><br><span class="line"> <span class="comment">// Measurements on the benchmarks have shown the following probabilities</span></span><br><span class="line"> <span class="comment">// for the loop to exit (ie. avg. number of iterations is reciprocal).</span></span><br><span class="line"> <span class="comment">// BM_Flat/6 txt1 p = 0.3-0.4</span></span><br><span class="line"> <span class="comment">// BM_Flat/7 txt2 p = 0.35</span></span><br><span class="line"> <span class="comment">// BM_Flat/8 txt3 p = 0.3-0.4</span></span><br><span class="line"> <span class="comment">// BM_Flat/9 txt3 p = 0.34-0.4</span></span><br><span class="line"> <span class="comment">// BM_Flat/10 pb p = 0.4</span></span><br><span class="line"> <span class="comment">// BM_Flat/11 gaviota p = 0.1</span></span><br><span class="line"> <span class="comment">// BM_Flat/12 cp p = 0.5</span></span><br><span class="line"> <span class="comment">// BM_Flat/13 c p = 0.3</span></span><br><span class="line"> } <span class="keyword">while</span> (<span class="built_in">static_cast</span><<span class="type">uint32_t</span>>(data) == LittleEndian::<span class="built_in">Load32</span>(candidate));</span><br><span class="line"> <span class="comment">// Because the least significant 5 bytes matched, we can utilize data</span></span><br><span class="line"> <span class="comment">// for the next iteration.</span></span><br><span class="line"> preload = data >> <span class="number">8</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line">emit_remainder:</span><br><span class="line"> <span class="comment">// Emit the remaining bytes as a literal</span></span><br><span class="line"> <span class="keyword">if</span> (ip < ip_end) {</span><br><span class="line"> op = <span class="built_in">EmitLiteral</span><<span class="comment">/*allow_fast_path=*/</span><span class="literal">false</span>>(op, ip, ip_end - ip);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> op;</span><br><span class="line">}</span><br><span class="line">} <span class="comment">// end namespace internal</span></span><br></pre></td></tr></table></figure><ul><li><p>核心代码是<code>for (uint32_t preload = LittleEndian::Load32(ip + 1);;)</code>控制的大循环。</p></li><li><p>j和k控制两层for循环,指针每次向后移动1个byte(即内层循环k每次加1,data右移8位),对于当前指针指向的4bytes内容dword,将其放入hashtable中。</p></li><li><p>如果在循环中出现了<code>candidata==dword</code>的情况,则将从next_emit开始的16个bytes作为literal写入op,然后<code>goto emit_match</code>。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> (<span class="built_in">SNAPPY_PREDICT_FALSE</span>(LittleEndian::<span class="built_in">Load32</span>(candidate) == dword)) {</span><br><span class="line"> *op = LITERAL | (i << <span class="number">2</span>);</span><br><span class="line"> <span class="built_in">UnalignedCopy128</span>(next_emit, op + <span class="number">1</span>);</span><br><span class="line"> ip += i;</span><br><span class="line"> op = op + i + <span class="number">2</span>;</span><br><span class="line"> <span class="keyword">goto</span> emit_match;</span><br><span class="line"> }</span><br></pre></td></tr></table></figure></li><li><p>否则,进入下面的while循环。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">while</span> (<span class="literal">true</span>) {</span><br><span class="line"> <span class="built_in">assert</span>(<span class="built_in">static_cast</span><<span class="type">uint32_t</span>>(data) == LittleEndian::<span class="built_in">Load32</span>(ip));</span><br><span class="line"> <span class="type">uint32_t</span> hash = <span class="built_in">HashBytes</span>(data, mask);</span><br><span class="line"> <span class="type">uint32_t</span> bytes_between_hash_lookups = skip >> <span class="number">5</span>;</span><br><span class="line"> skip += bytes_between_hash_lookups;</span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* next_ip = ip + bytes_between_hash_lookups;</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">SNAPPY_PREDICT_FALSE</span>(next_ip > ip_limit)) {</span><br><span class="line"> ip = next_emit;</span><br><span class="line"> <span class="keyword">goto</span> emit_remainder;</span><br><span class="line"> }</span><br><span class="line"> candidate = base_ip + table[hash];</span><br><span class="line"> <span class="built_in">assert</span>(candidate >= base_ip);</span><br><span class="line"> <span class="built_in">assert</span>(candidate < ip);</span><br><span class="line"></span><br><span class="line"> table[hash] = ip - base_ip;</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">SNAPPY_PREDICT_FALSE</span>(<span class="built_in">static_cast</span><<span class="type">uint32_t</span>>(data) ==</span><br><span class="line"> LittleEndian::<span class="built_in">Load32</span>(candidate))) {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> data = LittleEndian::<span class="built_in">Load32</span>(next_ip);</span><br><span class="line"> ip = next_ip;</span><br><span class="line"> }</span><br></pre></td></tr></table></figure><p>这就是注释中提到的启发式搜索,skip右移5位作为检查标准,不超过32bytes逐字节检查,超过32bytes不超过64bytes每两个字节检查一次…以此类推,bytes_between_hash_lookups的含义就是每多少个字节检查一次。最终会出现两种情况,一种是next_ip大于ip_limit,直接将其作为literal。另一种是data等于candidate,break跳出循环。</p></li><li><p>while循环结束后,我们得到了4bytes的match,先将match对应的literal写入op。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">assert</span>(next_emit + <span class="number">16</span> <= ip_end);</span><br><span class="line">op = <span class="built_in">EmitLiteral</span><<span class="comment">/*allow_fast_path=*/</span><span class="literal">true</span>>(op, next_emit, ip - next_emit);</span><br></pre></td></tr></table></figure></li><li><p>然后进入<code>emit_match</code>这个label标记的程序段。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line">emit_match:</span><br><span class="line"> <span class="keyword">do</span> {</span><br><span class="line"> <span class="comment">// We have a 4-byte match at ip, and no need to emit any</span></span><br><span class="line"> <span class="comment">// "literal bytes" prior to ip.</span></span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* base = ip;</span><br><span class="line"> std::pair<<span class="type">size_t</span>, <span class="type">bool</span>> p =</span><br><span class="line"> <span class="built_in">FindMatchLength</span>(candidate + <span class="number">4</span>, ip + <span class="number">4</span>, ip_end, &data);</span><br><span class="line"> <span class="type">size_t</span> matched = <span class="number">4</span> + p.first;</span><br><span class="line"> ip += matched;</span><br><span class="line"> <span class="type">size_t</span> offset = base - candidate;</span><br><span class="line"> <span class="built_in">assert</span>(<span class="number">0</span> == <span class="built_in">memcmp</span>(base, candidate, matched));</span><br><span class="line"> <span class="keyword">if</span> (p.second) {</span><br><span class="line"> op = <span class="built_in">EmitCopy</span><<span class="comment">/*len_less_than_12=*/</span><span class="literal">true</span>>(op, offset, matched);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> op = <span class="built_in">EmitCopy</span><<span class="comment">/*len_less_than_12=*/</span><span class="literal">false</span>>(op, offset, matched);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">SNAPPY_PREDICT_FALSE</span>(ip >= ip_limit)) {</span><br><span class="line"> <span class="keyword">goto</span> emit_remainder;</span><br><span class="line"> }</span><br><span class="line"> <span class="comment">// Expect 5 bytes to match</span></span><br><span class="line"> <span class="built_in">assert</span>((data & <span class="number">0xFFFFFFFFFF</span>) ==</span><br><span class="line"> (LittleEndian::<span class="built_in">Load64</span>(ip) & <span class="number">0xFFFFFFFFFF</span>));</span><br><span class="line"> <span class="comment">// We are now looking for a 4-byte match again. We read</span></span><br><span class="line"> <span class="comment">// table[Hash(ip, shift)] for that. To improve compression,</span></span><br><span class="line"> <span class="comment">// we also update table[Hash(ip - 1, mask)] and table[Hash(ip, mask)].</span></span><br><span class="line"> table[<span class="built_in">HashBytes</span>(LittleEndian::<span class="built_in">Load32</span>(ip - <span class="number">1</span>), mask)] = ip - base_ip - <span class="number">1</span>;</span><br><span class="line"> <span class="type">uint32_t</span> hash = <span class="built_in">HashBytes</span>(data, mask);</span><br><span class="line"> candidate = base_ip + table[hash];</span><br><span class="line"> table[hash] = ip - base_ip;</span><br><span class="line"> <span class="comment">// Measurements on the benchmarks have shown the following probabilities</span></span><br><span class="line"> <span class="comment">// for the loop to exit (ie. avg. number of iterations is reciprocal).</span></span><br><span class="line"> <span class="comment">// BM_Flat/6 txt1 p = 0.3-0.4</span></span><br><span class="line"> <span class="comment">// BM_Flat/7 txt2 p = 0.35</span></span><br><span class="line"> <span class="comment">// BM_Flat/8 txt3 p = 0.3-0.4</span></span><br><span class="line"> <span class="comment">// BM_Flat/9 txt3 p = 0.34-0.4</span></span><br><span class="line"> <span class="comment">// BM_Flat/10 pb p = 0.4</span></span><br><span class="line"> <span class="comment">// BM_Flat/11 gaviota p = 0.1</span></span><br><span class="line"> <span class="comment">// BM_Flat/12 cp p = 0.5</span></span><br><span class="line"> <span class="comment">// BM_Flat/13 c p = 0.3</span></span><br><span class="line"> } <span class="keyword">while</span> (<span class="built_in">static_cast</span><<span class="type">uint32_t</span>>(data) == LittleEndian::<span class="built_in">Load32</span>(candidate));</span><br><span class="line"> <span class="comment">// Because the least significant 5 bytes matched, we can utilize data</span></span><br><span class="line"> <span class="comment">// for the next iteration.</span></span><br><span class="line"> preload = data >> <span class="number">8</span>;</span><br></pre></td></tr></table></figure><p><code>FindMatchLength</code>求出最大的match长度,将offset和matched写入op,最后更新hashtable。如果data和candidate不相等,退出循环。</p></li></ul></li><li><p><code>CompressFragment</code>结束后,回到<code>Compress</code>中,最后通过<code>writer->Append(dest, end - dest)</code>写入writer。</p></li></ul></li></ul><h3 id="uncompress"><a class="markdownIt-Anchor" href="#uncompress"></a> Uncompress</h3><ul><li><p><code>RawUncompress</code></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">bool</span> <span class="title">RawUncompress</span><span class="params">(<span class="type">const</span> <span class="type">char</span>* compressed, <span class="type">size_t</span> compressed_length,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">char</span>* uncompressed)</span> </span>{</span><br><span class="line"> <span class="function">ByteArraySource <span class="title">reader</span><span class="params">(compressed, compressed_length)</span></span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">RawUncompress</span>(&reader, uncompressed);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>构造ByteArraySource,将reader作为参数调用重载的<code>RawUncompress</code>。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">bool</span> <span class="title">RawUncompress</span><span class="params">(Source* compressed, <span class="type">char</span>* uncompressed)</span> </span>{</span><br><span class="line"> <span class="function">SnappyArrayWriter <span class="title">output</span><span class="params">(uncompressed)</span></span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">InternalUncompress</span>(compressed, &output);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>构造SnappyArrayWriter,将output作为参数调用<code>InternalUncompress</code>。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">template</span> <<span class="keyword">typename</span> Writer></span><br><span class="line"><span class="function"><span class="type">static</span> <span class="type">bool</span> <span class="title">InternalUncompress</span><span class="params">(Source* r, Writer* writer)</span> </span>{</span><br><span class="line"> <span class="comment">// Read the uncompressed length from the front of the compressed input</span></span><br><span class="line"> <span class="function">SnappyDecompressor <span class="title">decompressor</span><span class="params">(r)</span></span>;</span><br><span class="line"> <span class="type">uint32_t</span> uncompressed_len = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">if</span> (!decompressor.<span class="built_in">ReadUncompressedLength</span>(&uncompressed_len)) <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">InternalUncompressAllTags</span>(&decompressor, writer, r-><span class="built_in">Available</span>(),</span><br><span class="line"> uncompressed_len);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>通过source构造decompressor,获取uncompressed_len,调用<code>InternalUncompressAllTags</code>。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">template</span> <<span class="keyword">typename</span> Writer></span><br><span class="line"><span class="function"><span class="type">static</span> <span class="type">bool</span> <span class="title">InternalUncompressAllTags</span><span class="params">(SnappyDecompressor* decompressor,</span></span></span><br><span class="line"><span class="params"><span class="function"> Writer* writer, <span class="type">uint32_t</span> compressed_len,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">uint32_t</span> uncompressed_len)</span> </span>{</span><br><span class="line"> <span class="built_in">Report</span>(<span class="string">"snappy_uncompress"</span>, compressed_len, uncompressed_len);</span><br><span class="line"></span><br><span class="line"> writer-><span class="built_in">SetExpectedLength</span>(uncompressed_len);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Process the entire input</span></span><br><span class="line"> decompressor-><span class="built_in">DecompressAllTags</span>(writer);</span><br><span class="line"> writer-><span class="built_in">Flush</span>();</span><br><span class="line"> <span class="keyword">return</span> (decompressor-><span class="built_in">eof</span>() && writer-><span class="built_in">CheckLength</span>());</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>在writer中设置uncompressed_len,通过decompressor的<code>DecompressAllTags(writer)</code>进行解压。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span></span></span><br><span class="line"><span class="function"> <span class="title">DecompressAllTags</span><span class="params">(Writer* writer)</span> </span>{</span><br><span class="line"> <span class="type">const</span> <span class="type">char</span>* ip = ip_;</span><br><span class="line"> <span class="built_in">ResetLimit</span>(ip);</span><br><span class="line"> <span class="keyword">auto</span> op = writer-><span class="built_in">GetOutputPtr</span>();</span><br><span class="line"> <span class="comment">// We could have put this refill fragment only at the beginning of the loop.</span></span><br><span class="line"> <span class="comment">// However, duplicating it at the end of each branch gives the compiler more</span></span><br><span class="line"> <span class="comment">// scope to optimize the <ip_limit_ - ip> expression based on the local</span></span><br><span class="line"> <span class="comment">// context, which overall increases speed.</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> MAYBE_REFILL() \</span></span><br><span class="line"><span class="meta"> <span class="keyword">if</span> (SNAPPY_PREDICT_FALSE(ip >= ip_limit_min_maxtaglen_)) { \</span></span><br><span class="line"><span class="meta"> ip_ = ip; \</span></span><br><span class="line"><span class="meta"> <span class="keyword">if</span> (SNAPPY_PREDICT_FALSE(!RefillTag())) goto exit; \</span></span><br><span class="line"><span class="meta"> ip = ip_; \</span></span><br><span class="line"><span class="meta"> ResetLimit(ip); \</span></span><br><span class="line"><span class="meta"> } \</span></span><br><span class="line"><span class="meta"> preload = static_cast<span class="string"><uint8_t></span>(*ip)</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// At the start of the for loop below the least significant byte of preload</span></span><br><span class="line"> <span class="comment">// contains the tag.</span></span><br><span class="line"> <span class="type">uint32_t</span> preload;</span><br><span class="line"> <span class="built_in">MAYBE_REFILL</span>();</span><br><span class="line"> <span class="keyword">for</span> (;;) {</span><br><span class="line"> {</span><br><span class="line"> <span class="type">ptrdiff_t</span> op_limit_min_slop;</span><br><span class="line"> <span class="keyword">auto</span> op_base = writer-><span class="built_in">GetBase</span>(&op_limit_min_slop);</span><br><span class="line"> <span class="keyword">if</span> (op_base) {</span><br><span class="line"> <span class="keyword">auto</span> res =</span><br><span class="line"> <span class="built_in">DecompressBranchless</span>(<span class="built_in">reinterpret_cast</span><<span class="type">const</span> <span class="type">uint8_t</span>*>(ip),</span><br><span class="line"> <span class="built_in">reinterpret_cast</span><<span class="type">const</span> <span class="type">uint8_t</span>*>(ip_limit_),</span><br><span class="line"> op - op_base, op_base, op_limit_min_slop);</span><br><span class="line"> ip = <span class="built_in">reinterpret_cast</span><<span class="type">const</span> <span class="type">char</span>*>(res.first);</span><br><span class="line"> op = op_base + res.second;</span><br><span class="line"> <span class="built_in">MAYBE_REFILL</span>();</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="type">const</span> <span class="type">uint8_t</span> c = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>(preload);</span><br><span class="line"> ip++;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Ratio of iterations that have LITERAL vs non-LITERAL for different</span></span><br><span class="line"> <span class="comment">// inputs.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// input LITERAL NON_LITERAL</span></span><br><span class="line"> <span class="comment">// -----------------------------------</span></span><br><span class="line"> <span class="comment">// html|html4|cp 23% 77%</span></span><br><span class="line"> <span class="comment">// urls 36% 64%</span></span><br><span class="line"> <span class="comment">// jpg 47% 53%</span></span><br><span class="line"> <span class="comment">// pdf 19% 81%</span></span><br><span class="line"> <span class="comment">// txt[1-4] 25% 75%</span></span><br><span class="line"> <span class="comment">// pb 24% 76%</span></span><br><span class="line"> <span class="comment">// bin 24% 76%</span></span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">SNAPPY_PREDICT_FALSE</span>((c & <span class="number">0x3</span>) == LITERAL)) {</span><br><span class="line"> <span class="type">size_t</span> literal_length = (c >> <span class="number">2</span>) + <span class="number">1u</span>;</span><br><span class="line"> <span class="keyword">if</span> (writer-><span class="built_in">TryFastAppend</span>(ip, ip_limit_ - ip, literal_length, &op)) {</span><br><span class="line"> <span class="built_in">assert</span>(literal_length < <span class="number">61</span>);</span><br><span class="line"> ip += literal_length;</span><br><span class="line"> <span class="comment">// <span class="doctag">NOTE:</span> There is no MAYBE_REFILL() here, as TryFastAppend()</span></span><br><span class="line"> <span class="comment">// will not return true unless there's already at least five spare</span></span><br><span class="line"> <span class="comment">// bytes in addition to the literal.</span></span><br><span class="line"> preload = <span class="built_in">static_cast</span><<span class="type">uint8_t</span>>(*ip);</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">SNAPPY_PREDICT_FALSE</span>(literal_length >= <span class="number">61</span>)) {</span><br><span class="line"> <span class="comment">// Long literal.</span></span><br><span class="line"> <span class="type">const</span> <span class="type">size_t</span> literal_length_length = literal_length - <span class="number">60</span>;</span><br><span class="line"> literal_length =</span><br><span class="line"> <span class="built_in">ExtractLowBytes</span>(LittleEndian::<span class="built_in">Load32</span>(ip), literal_length_length) +</span><br><span class="line"> <span class="number">1</span>;</span><br><span class="line"> ip += literal_length_length;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="type">size_t</span> avail = ip_limit_ - ip;</span><br><span class="line"> <span class="keyword">while</span> (avail < literal_length) {</span><br><span class="line"> <span class="keyword">if</span> (!writer-><span class="built_in">Append</span>(ip, avail, &op)) <span class="keyword">goto</span> exit;</span><br><span class="line"> literal_length -= avail;</span><br><span class="line"> reader_-><span class="built_in">Skip</span>(peeked_);</span><br><span class="line"> <span class="type">size_t</span> n;</span><br><span class="line"> ip = reader_-><span class="built_in">Peek</span>(&n);</span><br><span class="line"> avail = n;</span><br><span class="line"> peeked_ = avail;</span><br><span class="line"> <span class="keyword">if</span> (avail == <span class="number">0</span>) <span class="keyword">goto</span> exit;</span><br><span class="line"> ip_limit_ = ip + avail;</span><br><span class="line"> <span class="built_in">ResetLimit</span>(ip);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (!writer-><span class="built_in">Append</span>(ip, literal_length, &op)) <span class="keyword">goto</span> exit;</span><br><span class="line"> ip += literal_length;</span><br><span class="line"> <span class="built_in">MAYBE_REFILL</span>();</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">SNAPPY_PREDICT_FALSE</span>((c & <span class="number">3</span>) == COPY_4_BYTE_OFFSET)) {</span><br><span class="line"> <span class="type">const</span> <span class="type">size_t</span> copy_offset = LittleEndian::<span class="built_in">Load32</span>(ip);</span><br><span class="line"> <span class="type">const</span> <span class="type">size_t</span> length = (c >> <span class="number">2</span>) + <span class="number">1</span>;</span><br><span class="line"> ip += <span class="number">4</span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (!writer-><span class="built_in">AppendFromSelf</span>(copy_offset, length, &op)) <span class="keyword">goto</span> exit;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="type">const</span> <span class="type">ptrdiff_t</span> entry = kLengthMinusOffset[c];</span><br><span class="line"> preload = LittleEndian::<span class="built_in">Load32</span>(ip);</span><br><span class="line"> <span class="type">const</span> <span class="type">uint32_t</span> trailer = <span class="built_in">ExtractLowBytes</span>(preload, c & <span class="number">3</span>);</span><br><span class="line"> <span class="type">const</span> <span class="type">uint32_t</span> length = entry & <span class="number">0xff</span>;</span><br><span class="line"> <span class="built_in">assert</span>(length > <span class="number">0</span>);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// copy_offset/256 is encoded in bits 8..10. By just fetching</span></span><br><span class="line"> <span class="comment">// those bits, we get copy_offset (since the bit-field starts at</span></span><br><span class="line"> <span class="comment">// bit 8).</span></span><br><span class="line"> <span class="type">const</span> <span class="type">uint32_t</span> copy_offset = trailer - entry + length;</span><br><span class="line"> <span class="keyword">if</span> (!writer-><span class="built_in">AppendFromSelf</span>(copy_offset, length, &op)) <span class="keyword">goto</span> exit;</span><br><span class="line"></span><br><span class="line"> ip += (c & <span class="number">3</span>);</span><br><span class="line"> <span class="comment">// By using the result of the previous load we reduce the critical</span></span><br><span class="line"> <span class="comment">// dependency chain of ip to 4 cycles.</span></span><br><span class="line"> preload >>= (c & <span class="number">3</span>) * <span class="number">8</span>;</span><br><span class="line"> <span class="keyword">if</span> (ip < ip_limit_min_maxtaglen_) <span class="keyword">continue</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">MAYBE_REFILL</span>();</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"><span class="meta">#<span class="keyword">undef</span> MAYBE_REFILL</span></span><br><span class="line"> exit:</span><br><span class="line"> writer-><span class="built_in">SetOutputPtr</span>(op);</span><br><span class="line"> }</span><br></pre></td></tr></table></figure></li></ul><h3 id="pseudocode"><a class="markdownIt-Anchor" href="#pseudocode"></a> Pseudocode</h3><p><img src="https://raw.githubusercontent.com/TongYiheng/MarkdownPictures/main/Embedded/202110202113481.png" alt="" /></p><h3 id="performance"><a class="markdownIt-Anchor" href="#performance"></a> Performance</h3><p>Snappy is intended to be fast. On a single core of a Core i7 processor in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more. (These numbers are for the slowest inputs in our benchmark suite; others are much faster.) In our tests, Snappy usually is faster than algorithms in the same class (e.g. LZO, LZF, QuickLZ, etc.) while achieving comparable compression ratios.</p><p>Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and other already-compressed data. Similar numbers for zlib in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are capable of achieving yet higher compression rates, although usually at the expense of speed. Of course, compression ratio will vary significantly with the input.</p><p>Although Snappy should be fairly portable, it is primarily optimized for 64-bit x86-compatible processors, and may run slower in other environments.<br />In particular:</p><ul><li>Snappy uses 64-bit operations in several places to process more data at once than would otherwise be possible.</li><li>Snappy assumes unaligned 32 and 64-bit loads and stores are cheap. On some platforms, these must be emulated with single-byte loads and stores, which is much slower.</li><li>Snappy assumes little-endian throughout, and needs to byte-swap data in several places if running on a big-endian platform.</li></ul><p>Experience has shown that even heavily tuned code can be improved. Performance optimizations, whether for 64-bit x86 or other platforms, are of course most welcome; see “Contact”, below.</p><h2 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h2><ul><li><a href="https://github.com/google/snappy">https://github.com/google/snappy</a></li><li><a href="https://dirtysalt.github.io/html/snappy.html">https://dirtysalt.github.io/html/snappy.html</a></li></ul>]]></content>
<summary type="html"><p>How to analyse the source codes of snappy ? In this blog, the pseudocode of compression and uncompression using snappy is given, which is aimed to help you understand snappy algorithm.</p></summary>
<category term="Embedded" scheme="https://tong1heng.github.io/categories/Embedded/"/>
<category term="cpp" scheme="https://tong1heng.github.io/tags/cpp/"/>
<category term="RocksDB" scheme="https://tong1heng.github.io/tags/RocksDB/"/>
<category term="compression" scheme="https://tong1heng.github.io/tags/compression/"/>
</entry>
<entry>
<title>RocksDB和db_bench安装与配置</title>
<link href="https://tong1heng.github.io/2021/10/14/Embedded/db_bench/"/>
<id>https://tong1heng.github.io/2021/10/14/Embedded/db_bench/</id>
<published>2021-10-14T10:47:59.000Z</published>
<updated>2022-11-14T07:40:48.220Z</updated>
<content type="html"><![CDATA[<p>Start from a new Ubuntu OS.</p><span id="more"></span><h2 id="introduction"><a class="markdownIt-Anchor" href="#introduction"></a> Introduction</h2><p>After remaking N times, I made this blog finally.</p><p>Let’s start from <strong>a new Ubuntu</strong> now.</p><h2 id="steps"><a class="markdownIt-Anchor" href="#steps"></a> Steps</h2><h3 id="step-1"><a class="markdownIt-Anchor" href="#step-1"></a> Step 1</h3><p>首先安装gcc、g++等工具。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install build-essential</span><br></pre></td></tr></table></figure><p>然后安装一些必要的库,用于RocksDB的Compression。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt-get install libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev libzstd-dev libgflags-dev</span><br></pre></td></tr></table></figure><h3 id="step-2"><a class="markdownIt-Anchor" href="#step-2"></a> Step 2</h3><p>下载RocksDB源码并解压。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">wget https://github.com/facebook/rocksdb/archive/v6.25.1.zip</span><br><span class="line">unzip rocksdb-6.25.1.zip</span><br></pre></td></tr></table></figure><p>Tips:</p><ul><li>版本号可自己选择,下面涉及到版本号的命令需要对应更改。e.g. v6.6.4 (2020-01-31)</li><li>此过程需要的时间可能较长,可以通过其他方法下载zip压缩包,拷贝至Ubuntu系统。<strong>(Recommended)</strong></li><li>如果压缩包名字略有不同,自行更改。</li></ul><h3 id="step-3"><a class="markdownIt-Anchor" href="#step-3"></a> Step 3</h3><p>编译生成动态链接库和静态链接库</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> rocksdb-6.25.1</span><br><span class="line">make shared_lib && sudo make install-shared</span><br><span class="line">make static_lib && sudo make install-static</span><br></pre></td></tr></table></figure><p>Tips:</p><ul><li><p>如果先生成静态链接库再生成动态链接库,在生成动态链接库的时候会报错。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">make static_lib && sudo make install-static</span><br><span class="line">make shared_lib && sudo make install-shared</span><br></pre></td></tr></table></figure><p>解决办法如下:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">make clean</span><br><span class="line">make shared_lib</span><br><span class="line">make static_lib</span><br></pre></td></tr></table></figure></li><li><p>此过程需要的时间较长(约10min)。</p></li></ul><p>最后执行<code>sudo make install</code>命令。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo make install</span><br></pre></td></tr></table></figure><h3 id="step-4"><a class="markdownIt-Anchor" href="#step-4"></a> Step 4</h3><p>设置环境变量</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">#echo "/usr/local/lib" |sudo tee /etc/ld.so.conf.d/rocksdb-x86_64.confsudo ldconfig -v</span><br><span class="line">make shared_lib && sudo make install-shared</span><br><span class="line">sudo ldconfig -v</span><br></pre></td></tr></table></figure><p>Tips:</p><ul><li><code>#echo "/usr/local/lib" |sudo tee /etc/ld.so.conf.d/rocksdb-x86_64.confsudo ldconfig -v</code>: refresh the ldconfig cacheINSTALL_PATH=/usr</li><li><code>sudo ldconfig -v</code>: refresh the ldconfig cache</li></ul><h2 id="test"><a class="markdownIt-Anchor" href="#test"></a> Test</h2><p>新建测试程序<code>rocksdbtest.cpp</code></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><cstdio></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string"><string></span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">"rocksdb/db.h"</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">"rocksdb/slice.h"</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">include</span> <span class="string">"rocksdb/options.h"</span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> std;</span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> rocksdb;</span><br><span class="line"></span><br><span class="line"><span class="type">const</span> std::string PATH = <span class="string">"/tmp/rocksdb_tmp"</span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> DB* db;</span><br><span class="line"> Options options;</span><br><span class="line"> options.create_if_missing = <span class="literal">true</span>;</span><br><span class="line"> Status status = DB::<span class="built_in">Open</span>(options, PATH, &db);</span><br><span class="line"> <span class="built_in">assert</span>(status.<span class="built_in">ok</span>());</span><br><span class="line"> <span class="function">Slice <span class="title">key</span><span class="params">(<span class="string">"foo"</span>)</span></span>;</span><br><span class="line"> <span class="function">Slice <span class="title">value</span><span class="params">(<span class="string">"bar"</span>)</span></span>;</span><br><span class="line"> </span><br><span class="line"> std::string get_value;</span><br><span class="line"> status = db-><span class="built_in">Put</span>(<span class="built_in">WriteOptions</span>(), key, value);</span><br><span class="line"> <span class="keyword">if</span>(status.<span class="built_in">ok</span>()) {</span><br><span class="line"> status = db-><span class="built_in">Get</span>(<span class="built_in">ReadOptions</span>(), key, &get_value);</span><br><span class="line"> <span class="keyword">if</span>(status.<span class="built_in">ok</span>()) {</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"get %s success!!\n"</span>, get_value.<span class="built_in">c_str</span>());</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"get failed\n"</span>); </span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"put failed\n"</span>);</span><br><span class="line"> }</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">delete</span> db;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>动态编译</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">g++ -std=c++11 -o rocksdbtest rocksdbtest.cpp -lrocksdb -lpthread</span><br></pre></td></tr></table></figure><p>执行</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./rocksdbtest</span><br></pre></td></tr></table></figure><p>正确结果</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">get bar success!!</span><br></pre></td></tr></table></figure><h2 id="db_bench"><a class="markdownIt-Anchor" href="#db_bench"></a> db_bench</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">make clean</span><br><span class="line">make db_bench</span><br><span class="line">./db_bench</span><br></pre></td></tr></table></figure><p>Tips:</p><ul><li>运行db_bench时设置参数<br />e.g.<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./db_bench -benchmarks=<span class="string">"fillrandom,stats"</span> -statistics -key_size=16 -value_size=65536 -db=./test_db1 -wal_dir=./test_db1 -duration=6000 -level0_file_num_compaction_trigger=1 -enable_pipelined_write=<span class="literal">true</span> -compression_type=None -stats_per_interval=1 -stats_interval_seconds=10 -max_write_buffer_number=6</span><br></pre></td></tr></table></figure></li></ul><h2 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h2><ul><li><a href="https://blog.51cto.com/u_15081048/2592774">https://blog.51cto.com/u_15081048/2592774</a></li><li><a href="https://www.jianshu.com/p/575b2e27b028">https://www.jianshu.com/p/575b2e27b028</a></li><li><a href="https://blog.csdn.net/zhangpeterx/article/details/96869454">https://blog.csdn.net/zhangpeterx/article/details/96869454</a></li><li><a href="https://www.cxyzjd.com/article/zhangpeterx/96869454">https://www.cxyzjd.com/article/zhangpeterx/96869454</a></li></ul>]]></content>
<summary type="html"><p>Start from a new Ubuntu OS.</p></summary>
<category term="Embedded" scheme="https://tong1heng.github.io/categories/Embedded/"/>
<category term="RocksDB" scheme="https://tong1heng.github.io/tags/RocksDB/"/>
<category term="db_bench" scheme="https://tong1heng.github.io/tags/db-bench/"/>
</entry>
<entry>
<title>The Use of "(void)val"</title>
<link href="https://tong1heng.github.io/2021/10/11/Tricks/void/"/>
<id>https://tong1heng.github.io/2021/10/11/Tricks/void/</id>
<published>2021-10-11T02:09:14.000Z</published>
<updated>2021-10-19T07:17:39.184Z</updated>
<content type="html"><![CDATA[<p>Have you ever seen “(void)val” in codes ?</p><span id="more"></span><p><strong>Why <code>(void)val</code></strong></p><p>作用是避免编译器警告。如果声明/定义了但未使用的变量,在编译时会生成warning。如果项目里打开了-Werror选项,会将warning视为error,这样的话无法通过编译,所以需要用这种方法绕过无关紧要的warning。</p>]]></content>
<summary type="html"><p>Have you ever seen “(void)val” in codes ?</p></summary>
<category term="Tricks" scheme="https://tong1heng.github.io/categories/Tricks/"/>
<category term="cpp" scheme="https://tong1heng.github.io/tags/cpp/"/>
</entry>
<entry>
<title>ffmpeg</title>
<link href="https://tong1heng.github.io/2021/09/27/Tools/ffmpeg/"/>
<id>https://tong1heng.github.io/2021/09/27/Tools/ffmpeg/</id>
<published>2021-09-27T10:41:55.000Z</published>
<updated>2021-10-19T04:13:12.110Z</updated>
<content type="html"><![CDATA[<p>ffmpeg常用命令。</p><span id="more"></span><h2 id="常用命令"><a class="markdownIt-Anchor" href="#常用命令"></a> 常用命令</h2><ul><li><p>查看媒体文件详细信息<br /><code>$ ffmpeg -i video.mp4</code></p></li><li><p>转换视频格式flv->mp4<br /><code>$ ffmpeg -i input.flv output.mp4</code></p></li><li><p>从一个媒体文件移除视频流<br /><code>$ ffmpeg -i input.mp4 -vn output.mp3</code></p></li><li><p>从一个视频文件移除音频流<br /><code>$ ffmpeg -i input.mp4 -an output.mp4</code></p></li><li><p>预览或测试视频或音频文件<br /><code>$ ffplay video.mp4</code><br /><code>$ ffplay audio.mp3</code></p></li><li><p>增加视频播放速度<br /><code>$ ffmpeg -i input.mp4 -vf "setpts=0.5*PTS" output.mp4</code></p></li><li><p>减少视频播放速度<br /><code>$ ffmpeg -i input.mp4 -vf "setpts=4.0*PTS" output.mp4</code></p></li><li><p>获取帮助<br /><code>$ man ffmpeg</code></p></li></ul><h2 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h2><ul><li><a href="https://zhuanlan.zhihu.com/p/67878761">https://zhuanlan.zhihu.com/p/67878761</a></li></ul>]]></content>
<summary type="html"><p>ffmpeg常用命令。</p></summary>
<category term="Tools" scheme="https://tong1heng.github.io/categories/Tools/"/>
<category term="ffmpeg" scheme="https://tong1heng.github.io/tags/ffmpeg/"/>
</entry>
<entry>
<title>RocksDB Compaction源码分析</title>
<link href="https://tong1heng.github.io/2021/09/24/Embedded/rocksdb_compaction/"/>
<id>https://tong1heng.github.io/2021/09/24/Embedded/rocksdb_compaction/</id>
<published>2021-09-24T12:30:00.000Z</published>
<updated>2022-11-14T07:41:02.061Z</updated>
<content type="html"><![CDATA[<p> RocksDB的Compaction过程整体可分为三个部分,prepare keys、process keys、write keys。</p><span id="more"></span><ul><li>入口:<code>db/db_impl_compaction_flush.cc</code>中的<code>BackgroundCompaction()</code></li></ul><h2 id="prepare-keys"><a class="markdownIt-Anchor" href="#prepare-keys"></a> Prepare keys</h2><h3 id="触发条件"><a class="markdownIt-Anchor" href="#触发条件"></a> 触发条件</h3><ul><li><p>RocksDB的compaction都是后台运行,通过线程<code>BGWorkCompaction</code>进行compaction的调度。Compaction分为两种:</p><ul><li>Manual compaction by <code>CompactFiles()</code></li><li>Auto compaction by <code>BackgroundCompaction()</code></li></ul></li><li><p><code>MaybeScheduleFlushOrCompaction</code></p></li></ul><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">while</span> (bg_compaction_scheduled_ < bg_job_limits.max_compactions &&</span><br><span class="line"> unscheduled_compactions_ > <span class="number">0</span>) {</span><br><span class="line"> CompactionArg* ca = <span class="keyword">new</span> CompactionArg;</span><br><span class="line"> ca->db = <span class="keyword">this</span>;</span><br><span class="line"> ca->prepicked_compaction = <span class="literal">nullptr</span>;</span><br><span class="line"> bg_compaction_scheduled_++; <span class="comment">//正在被调度的compaction线程数目</span></span><br><span class="line"> unscheduled_compactions_--; <span class="comment">//待调度的线程个数,及待调度的cfd的长度</span></span><br><span class="line"> <span class="comment">//调度BGWorkCompaction线程</span></span><br><span class="line"> env_-><span class="built_in">Schedule</span>(&DBImpl::BGWorkCompaction, ca, Env::Priority::LOW, <span class="keyword">this</span>,</span><br><span class="line"> &DBImpl::UnscheduleCompactionCallback);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>可以看到最大线程数量限制是<code>bg_job_limits.max_compactions</code>。</p><ul><li>队列<code>DBImpl::compaction_queue_</code></li></ul><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">std::deque<ColumnFamilyData*> compaction_queue_;</span><br></pre></td></tr></table></figure><p> 这个队列的更新是在函数<code>SchedulePendingCompaction</code>更新的,且<code>unscheduled_compactions_</code>变量是和该函数一起更新的,也就是只有设置了该变量才能够正常调度compaction后台线程。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">DBImpl::SchedulePendingCompaction</span><span class="params">(ColumnFamilyData* cfd)</span> </span>{</span><br><span class="line"> <span class="keyword">if</span> (!cfd-><span class="built_in">queued_for_compaction</span>() && cfd-><span class="built_in">NeedsCompaction</span>()) {</span><br><span class="line"> <span class="built_in">AddToCompactionQueue</span>(cfd);</span><br><span class="line"> ++unscheduled_compactions_;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>上面的核心函数是<code>NeedsCompaction</code>,通过这个函数来判断是否有sst需要被compact。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">bool</span> <span class="title">LevelCompactionPicker::NeedsCompaction</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">const</span> VersionStorageInfo* vstorage)</span> <span class="type">const</span> </span>{</span><br><span class="line"> <span class="keyword">if</span> (!vstorage-><span class="built_in">ExpiredTtlFiles</span>().<span class="built_in">empty</span>()) { <span class="comment">//有超时的sst(ExpiredTtlFiles)</span></span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (!vstorage-><span class="built_in">FilesMarkedForPeriodicCompaction</span>().<span class="built_in">empty</span>()) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (!vstorage-><span class="built_in">BottommostFilesMarkedForCompaction</span>().<span class="built_in">empty</span>()) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (!vstorage-><span class="built_in">FilesMarkedForCompaction</span>().<span class="built_in">empty</span>()) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i <= vstorage-><span class="built_in">MaxInputLevel</span>(); i++) {</span><br><span class="line"> <span class="keyword">if</span> (vstorage-><span class="built_in">CompactionScore</span>(i) >= <span class="number">1</span>) { <span class="comment">//遍历所有的level的sst,根据score判断是否需要compact</span></span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h3 id="sst文件的选择"><a class="markdownIt-Anchor" href="#sst文件的选择"></a> SST文件的选择</h3><p>下面这两个变量分别保存了level以及每个level所对应的score,score越高,优先级越高。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">std::vector<<span class="type">double</span>> compaction_score_; <span class="comment">//当前sst的score</span></span><br><span class="line">std::vector<<span class="type">int</span>> compaction_level_; <span class="comment">//当前sst需要被compact到的层level</span></span><br></pre></td></tr></table></figure><p>这两个变量的更新在函数<code>VersionStorageInfo::ComputeCompactionScore</code>中。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">VersionStorageInfo::ComputeCompactionScore</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">const</span> ImmutableOptions& immutable_options,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">const</span> MutableCFOptions& mutable_cf_options)</span> </span>{</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> level = <span class="number">0</span>; level <= <span class="built_in">MaxInputLevel</span>(); level++) {</span><br><span class="line"> <span class="type">double</span> score;</span><br><span class="line"> <span class="keyword">if</span> (level == <span class="number">0</span>) {</span><br><span class="line"> <span class="comment">// We treat level-0 specially by bounding the number of files</span></span><br><span class="line"> <span class="comment">// instead of number of bytes for two reasons:</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// (1) With larger write-buffer sizes, it is nice not to do too</span></span><br><span class="line"> <span class="comment">// many level-0 compactions.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// (2) The files in level-0 are merged on every read and</span></span><br><span class="line"> <span class="comment">// therefore we wish to avoid too many files when the individual</span></span><br><span class="line"> <span class="comment">// file size is small (perhaps because of a small write-buffer</span></span><br><span class="line"> <span class="comment">// setting, or very high compression ratios, or lots of</span></span><br><span class="line"> <span class="comment">// overwrites/deletions).</span></span><br><span class="line"> <span class="type">int</span> num_sorted_runs = <span class="number">0</span>;</span><br><span class="line"> <span class="type">uint64_t</span> total_size = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span>* f : files_[level]) {</span><br><span class="line"> <span class="keyword">if</span> (!f->being_compacted) {</span><br><span class="line"> total_size += f->compensated_file_size;</span><br><span class="line"> num_sorted_runs++;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (compaction_style_ == kCompactionStyleUniversal) {</span><br><span class="line"> <span class="comment">// For universal compaction, we use level0 score to indicate</span></span><br><span class="line"> <span class="comment">// compaction score for the whole DB. Adding other levels as if</span></span><br><span class="line"> <span class="comment">// they are L0 files.</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">1</span>; i < <span class="built_in">num_levels</span>(); i++) {</span><br><span class="line"> <span class="comment">// Its possible that a subset of the files in a level may be in a</span></span><br><span class="line"> <span class="comment">// compaction, due to delete triggered compaction or trivial move.</span></span><br><span class="line"> <span class="comment">// In that case, the below check may not catch a level being</span></span><br><span class="line"> <span class="comment">// compacted as it only checks the first file. The worst that can</span></span><br><span class="line"> <span class="comment">// happen is a scheduled compaction thread will find nothing to do.</span></span><br><span class="line"> <span class="keyword">if</span> (!files_[i].<span class="built_in">empty</span>() && !files_[i][<span class="number">0</span>]->being_compacted) {</span><br><span class="line"> num_sorted_runs++;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (compaction_style_ == kCompactionStyleFIFO) {</span><br><span class="line"> score = <span class="built_in">static_cast</span><<span class="type">double</span>>(total_size) /</span><br><span class="line"> mutable_cf_options.compaction_options_fifo.max_table_files_size;</span><br><span class="line"> <span class="keyword">if</span> (mutable_cf_options.compaction_options_fifo.allow_compaction ||</span><br><span class="line"> mutable_cf_options.compaction_options_fifo.age_for_warm > <span class="number">0</span>) {</span><br><span class="line"> <span class="comment">// Warm tier move can happen at any time. It's too expensive to</span></span><br><span class="line"> <span class="comment">// check very file's timestamp now. For now, just trigger it</span></span><br><span class="line"> <span class="comment">// slightly more frequently than FIFO compaction so that this</span></span><br><span class="line"> <span class="comment">// happens first.</span></span><br><span class="line"> score = std::<span class="built_in">max</span>(</span><br><span class="line"> <span class="built_in">static_cast</span><<span class="type">double</span>>(num_sorted_runs) /</span><br><span class="line"> mutable_cf_options.level0_file_num_compaction_trigger,</span><br><span class="line"> score);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (mutable_cf_options.ttl > <span class="number">0</span>) {</span><br><span class="line"> score = std::<span class="built_in">max</span>(</span><br><span class="line"> <span class="built_in">static_cast</span><<span class="type">double</span>>(<span class="built_in">GetExpiredTtlFilesCount</span>(</span><br><span class="line"> immutable_options, mutable_cf_options, files_[level])),</span><br><span class="line"> score);</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> score = <span class="built_in">static_cast</span><<span class="type">double</span>>(num_sorted_runs) /</span><br><span class="line"> mutable_cf_options.level0_file_num_compaction_trigger;</span><br><span class="line"> <span class="keyword">if</span> (compaction_style_ == kCompactionStyleLevel && <span class="built_in">num_levels</span>() > <span class="number">1</span>) {</span><br><span class="line"> <span class="comment">// Level-based involves L0->L0 compactions that can lead to oversized</span></span><br><span class="line"> <span class="comment">// L0 files. Take into account size as well to avoid later giant</span></span><br><span class="line"> <span class="comment">// compactions to the base level.</span></span><br><span class="line"> <span class="type">uint64_t</span> l0_target_size = mutable_cf_options.max_bytes_for_level_base;</span><br><span class="line"> <span class="keyword">if</span> (immutable_options.level_compaction_dynamic_level_bytes &&</span><br><span class="line"> level_multiplier_ != <span class="number">0.0</span>) {</span><br><span class="line"> <span class="comment">// Prevent L0 to Lbase fanout from growing larger than</span></span><br><span class="line"> <span class="comment">// `level_multiplier_`. This prevents us from getting stuck picking</span></span><br><span class="line"> <span class="comment">// L0 forever even when it is hurting write-amp. That could happen</span></span><br><span class="line"> <span class="comment">// in dynamic level compaction's write-burst mode where the base</span></span><br><span class="line"> <span class="comment">// level's target size can grow to be enormous.</span></span><br><span class="line"> l0_target_size =</span><br><span class="line"> std::<span class="built_in">max</span>(l0_target_size,</span><br><span class="line"> <span class="built_in">static_cast</span><<span class="type">uint64_t</span>>(level_max_bytes_[base_level_] /</span><br><span class="line"> level_multiplier_));</span><br><span class="line"> }</span><br><span class="line"> score =</span><br><span class="line"> std::<span class="built_in">max</span>(score, <span class="built_in">static_cast</span><<span class="type">double</span>>(total_size) / l0_target_size);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">// Compute the ratio of current size to size limit.</span></span><br><span class="line"> <span class="type">uint64_t</span> level_bytes_no_compacting = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> f : files_[level]) {</span><br><span class="line"> <span class="keyword">if</span> (!f->being_compacted) {</span><br><span class="line"> level_bytes_no_compacting += f->compensated_file_size;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> score = <span class="built_in">static_cast</span><<span class="type">double</span>>(level_bytes_no_compacting) /</span><br><span class="line"> <span class="built_in">MaxBytesForLevel</span>(level);</span><br><span class="line"> }</span><br><span class="line"> compaction_level_[level] = level;</span><br><span class="line"> compaction_score_[level] = score;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// sort all the levels based on their score. Higher scores get listed</span></span><br><span class="line"> <span class="comment">// first. Use bubble sort because the number of entries are small.</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i < <span class="built_in">num_levels</span>() - <span class="number">2</span>; i++) {</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> j = i + <span class="number">1</span>; j < <span class="built_in">num_levels</span>() - <span class="number">1</span>; j++) {</span><br><span class="line"> <span class="keyword">if</span> (compaction_score_[i] < compaction_score_[j]) {</span><br><span class="line"> <span class="type">double</span> score = compaction_score_[i];</span><br><span class="line"> <span class="type">int</span> level = compaction_level_[i];</span><br><span class="line"> compaction_score_[i] = compaction_score_[j];</span><br><span class="line"> compaction_level_[i] = compaction_level_[j];</span><br><span class="line"> compaction_score_[j] = score;</span><br><span class="line"> compaction_level_[j] = level;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">ComputeFilesMarkedForCompaction</span>();</span><br><span class="line"> <span class="built_in">ComputeBottommostFilesMarkedForCompaction</span>();</span><br><span class="line"> <span class="keyword">if</span> (mutable_cf_options.ttl > <span class="number">0</span>) {</span><br><span class="line"> <span class="built_in">ComputeExpiredTtlFiles</span>(immutable_options, mutable_cf_options.ttl);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (mutable_cf_options.periodic_compaction_seconds > <span class="number">0</span>) {</span><br><span class="line"> <span class="built_in">ComputeFilesMarkedForPeriodicCompaction</span>(</span><br><span class="line"> immutable_options, mutable_cf_options.periodic_compaction_seconds);</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">EstimateCompactionBytesNeeded</span>(mutable_cf_options);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h3 id="compaction每一层level大小的确定"><a class="markdownIt-Anchor" href="#compaction每一层level大小的确定"></a> compaction每一层level大小的确定</h3><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">VersionStorageInfo::CalculateBaseBytes</span><span class="params">(<span class="type">const</span> ImmutableOptions& ioptions,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">const</span> MutableCFOptions& options)</span> </span>{</span><br><span class="line"> <span class="comment">// Special logic to set number of sorted runs.</span></span><br><span class="line"> <span class="comment">// It is to match the previous behavior when all files are in L0.</span></span><br><span class="line"> <span class="type">int</span> num_l0_count = <span class="built_in">static_cast</span><<span class="type">int</span>>(files_[<span class="number">0</span>].<span class="built_in">size</span>());</span><br><span class="line"> <span class="keyword">if</span> (compaction_style_ == kCompactionStyleUniversal) {</span><br><span class="line"> <span class="comment">// For universal compaction, we use level0 score to indicate</span></span><br><span class="line"> <span class="comment">// compaction score for the whole DB. Adding other levels as if</span></span><br><span class="line"> <span class="comment">// they are L0 files.</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">1</span>; i < <span class="built_in">num_levels</span>(); i++) {</span><br><span class="line"> <span class="keyword">if</span> (!files_[i].<span class="built_in">empty</span>()) {</span><br><span class="line"> num_l0_count++;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">set_l0_delay_trigger_count</span>(num_l0_count);</span><br><span class="line"></span><br><span class="line"> level_max_bytes_.<span class="built_in">resize</span>(ioptions.num_levels);</span><br><span class="line"> <span class="keyword">if</span> (!ioptions.level_compaction_dynamic_level_bytes) {</span><br><span class="line"> base_level_ = (ioptions.compaction_style == kCompactionStyleLevel) ? <span class="number">1</span> : <span class="number">-1</span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Calculate for static bytes base case</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i < ioptions.num_levels; ++i) {</span><br><span class="line"> <span class="keyword">if</span> (i == <span class="number">0</span> && ioptions.compaction_style == kCompactionStyleUniversal) {</span><br><span class="line"> level_max_bytes_[i] = options.max_bytes_for_level_base;</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (i > <span class="number">1</span>) {</span><br><span class="line"> level_max_bytes_[i] = <span class="built_in">MultiplyCheckOverflow</span>(</span><br><span class="line"> <span class="built_in">MultiplyCheckOverflow</span>(level_max_bytes_[i - <span class="number">1</span>],</span><br><span class="line"> options.max_bytes_for_level_multiplier),</span><br><span class="line"> options.<span class="built_in">MaxBytesMultiplerAdditional</span>(i - <span class="number">1</span>));</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> level_max_bytes_[i] = options.max_bytes_for_level_base;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="type">uint64_t</span> max_level_size = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"> <span class="type">int</span> first_non_empty_level = <span class="number">-1</span>;</span><br><span class="line"> <span class="comment">// Find size of non-L0 level of most data.</span></span><br><span class="line"> <span class="comment">// Cannot use the size of the last level because it can be empty or less</span></span><br><span class="line"> <span class="comment">// than previous levels after compaction.</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">1</span>; i < num_levels_; i++) {</span><br><span class="line"> <span class="type">uint64_t</span> total_size = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">const</span> <span class="keyword">auto</span>& f : files_[i]) {</span><br><span class="line"> total_size += f->fd.<span class="built_in">GetFileSize</span>();</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (total_size > <span class="number">0</span> && first_non_empty_level == <span class="number">-1</span>) {</span><br><span class="line"> first_non_empty_level = i;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (total_size > max_level_size) {</span><br><span class="line"> max_level_size = total_size;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Prefill every level's max bytes to disallow compaction from there.</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i < num_levels_; i++) {</span><br><span class="line"> level_max_bytes_[i] = std::numeric_limits<<span class="type">uint64_t</span>>::<span class="built_in">max</span>();</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (max_level_size == <span class="number">0</span>) {</span><br><span class="line"> <span class="comment">// No data for L1 and up. L0 compacts to last level directly.</span></span><br><span class="line"> <span class="comment">// No compaction from L1+ needs to be scheduled.</span></span><br><span class="line"> base_level_ = num_levels_ - <span class="number">1</span>;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="type">uint64_t</span> l0_size = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">const</span> <span class="keyword">auto</span>& f : files_[<span class="number">0</span>]) {</span><br><span class="line"> l0_size += f->fd.<span class="built_in">GetFileSize</span>();</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="type">uint64_t</span> base_bytes_max =</span><br><span class="line"> std::<span class="built_in">max</span>(options.max_bytes_for_level_base, l0_size);</span><br><span class="line"> <span class="type">uint64_t</span> base_bytes_min = <span class="built_in">static_cast</span><<span class="type">uint64_t</span>>(</span><br><span class="line"> base_bytes_max / options.max_bytes_for_level_multiplier);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Try whether we can make last level's target size to be max_level_size</span></span><br><span class="line"> <span class="type">uint64_t</span> cur_level_size = max_level_size;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = num_levels_ - <span class="number">2</span>; i >= first_non_empty_level; i--) { <span class="comment">//从倒数第二层level往上到first non empty level</span></span><br><span class="line"> <span class="comment">// Round up after dividing</span></span><br><span class="line"> cur_level_size = <span class="built_in">static_cast</span><<span class="type">uint64_t</span>>(</span><br><span class="line"> cur_level_size / options.max_bytes_for_level_multiplier);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Calculate base level and its size.</span></span><br><span class="line"> <span class="type">uint64_t</span> base_level_size;</span><br><span class="line"> <span class="keyword">if</span> (cur_level_size <= base_bytes_min) {</span><br><span class="line"> <span class="comment">// Case 1. If we make target size of last level to be max_level_size,</span></span><br><span class="line"> <span class="comment">// target size of the first non-empty level would be smaller than</span></span><br><span class="line"> <span class="comment">// base_bytes_min. We set it be base_bytes_min.</span></span><br><span class="line"> base_level_size = base_bytes_min + <span class="number">1U</span>;</span><br><span class="line"> base_level_ = first_non_empty_level;</span><br><span class="line"> <span class="built_in">ROCKS_LOG_INFO</span>(ioptions.logger,</span><br><span class="line"> <span class="string">"More existing levels in DB than needed. "</span></span><br><span class="line"> <span class="string">"max_bytes_for_level_multiplier may not be guaranteed."</span>);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">// Find base level (where L0 data is compacted to).</span></span><br><span class="line"> base_level_ = first_non_empty_level;</span><br><span class="line"> <span class="keyword">while</span> (base_level_ > <span class="number">1</span> && cur_level_size > base_bytes_max) {</span><br><span class="line"> --base_level_;</span><br><span class="line"> cur_level_size = <span class="built_in">static_cast</span><<span class="type">uint64_t</span>>(</span><br><span class="line"> cur_level_size / options.max_bytes_for_level_multiplier);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (cur_level_size > base_bytes_max) {</span><br><span class="line"> <span class="comment">// Even L1 will be too large</span></span><br><span class="line"> <span class="built_in">assert</span>(base_level_ == <span class="number">1</span>);</span><br><span class="line"> base_level_size = base_bytes_max;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> base_level_size = cur_level_size;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> level_multiplier_ = options.max_bytes_for_level_multiplier;</span><br><span class="line"> <span class="built_in">assert</span>(base_level_size > <span class="number">0</span>);</span><br><span class="line"> <span class="keyword">if</span> (l0_size > base_level_size &&</span><br><span class="line"> (l0_size > options.max_bytes_for_level_base ||</span><br><span class="line"> <span class="built_in">static_cast</span><<span class="type">int</span>>(files_[<span class="number">0</span>].<span class="built_in">size</span>() / <span class="number">2</span>) >=</span><br><span class="line"> options.level0_file_num_compaction_trigger)) {</span><br><span class="line"> <span class="comment">// We adjust the base level according to actual L0 size, and adjust</span></span><br><span class="line"> <span class="comment">// the level multiplier accordingly, when:</span></span><br><span class="line"> <span class="comment">// 1. the L0 size is larger than level size base, or</span></span><br><span class="line"> <span class="comment">// 2. number of L0 files reaches twice the L0->L1 compaction trigger</span></span><br><span class="line"> <span class="comment">// We don't do this otherwise to keep the LSM-tree structure stable</span></span><br><span class="line"> <span class="comment">// unless the L0 compaction is backlogged.</span></span><br><span class="line"> base_level_size = l0_size;</span><br><span class="line"> <span class="keyword">if</span> (base_level_ == num_levels_ - <span class="number">1</span>) {</span><br><span class="line"> level_multiplier_ = <span class="number">1.0</span>;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> level_multiplier_ = std::<span class="built_in">pow</span>(</span><br><span class="line"> <span class="built_in">static_cast</span><<span class="type">double</span>>(max_level_size) /</span><br><span class="line"> <span class="built_in">static_cast</span><<span class="type">double</span>>(base_level_size),</span><br><span class="line"> <span class="number">1.0</span> / <span class="built_in">static_cast</span><<span class="type">double</span>>(num_levels_ - base_level_ - <span class="number">1</span>));</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="type">uint64_t</span> level_size = base_level_size;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = base_level_; i < num_levels_; i++) {</span><br><span class="line"> <span class="keyword">if</span> (i > base_level_) {</span><br><span class="line"> level_size = <span class="built_in">MultiplyCheckOverflow</span>(level_size, level_multiplier_);</span><br><span class="line"> }</span><br><span class="line"> <span class="comment">// Don't set any level below base_bytes_max. Otherwise, the LSM can</span></span><br><span class="line"> <span class="comment">// assume an hourglass shape where L1+ sizes are smaller than L0. This</span></span><br><span class="line"> <span class="comment">// causes compaction scoring, which depends on level sizes, to favor L1+</span></span><br><span class="line"> <span class="comment">// at the expense of L0, which may fill up and stall.</span></span><br><span class="line"> level_max_bytes_[i] = std::<span class="built_in">max</span>(level_size, base_bytes_max);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ol><li><p>static:每一层的大小都是固定的</p></li><li><p>dynamic:动态根据每一层大小进行计算</p></li></ol><ul><li>引入base level的概念,通常使用空间放大来衡量空间效率,忽略数据压缩的影响,空间放大 = size_on_file_system / size_of_user_data。</li></ul><h3 id="挑选参与compaction的文件"><a class="markdownIt-Anchor" href="#挑选参与compaction的文件"></a> 挑选参与compaction的文件</h3><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">Compaction* <span class="title">LevelCompactionBuilder::PickCompaction</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="comment">// Pick up the first file to start compaction. It may have been extended</span></span><br><span class="line"> <span class="comment">// to a clean cut.</span></span><br><span class="line"> <span class="built_in">SetupInitialFiles</span>();</span><br><span class="line"> <span class="keyword">if</span> (start_level_inputs_.<span class="built_in">empty</span>()) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nullptr</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">assert</span>(start_level_ >= <span class="number">0</span> && output_level_ >= <span class="number">0</span>);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// If it is a L0 -> base level compaction, we need to set up other L0</span></span><br><span class="line"> <span class="comment">// files if needed.</span></span><br><span class="line"> <span class="keyword">if</span> (!<span class="built_in">SetupOtherL0FilesIfNeeded</span>()) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nullptr</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Pick files in the output level and expand more files in the start level</span></span><br><span class="line"> <span class="comment">// if needed.</span></span><br><span class="line"> <span class="keyword">if</span> (!<span class="built_in">SetupOtherInputsIfNeeded</span>()) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nullptr</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Form a compaction object containing the files we picked.</span></span><br><span class="line"> Compaction* c = <span class="built_in">GetCompaction</span>();</span><br><span class="line"></span><br><span class="line"> <span class="built_in">TEST_SYNC_POINT_CALLBACK</span>(<span class="string">"LevelCompactionPicker::PickCompaction:Return"</span>, c);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> c;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>这里PickCompaction分别调用了三个主要的函数。</p><ul><li><code>SetupInitialFiles</code> 初始化需要compact的文件</li><li><code>SetupOtherL0FilesIfNeeded</code> 如果需要的话,setup一些其他的L0文件</li><li><code>SetupOtherInputsIfNeeded</code> 如果需要的话,setup一些其他的inputs</li></ul><p>下面首先分析<code>SetupInitialFiles</code>。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">LevelCompactionBuilder::SetupInitialFiles</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="comment">// Find the compactions by size on all levels.</span></span><br><span class="line"> <span class="type">bool</span> skipped_l0_to_base = <span class="literal">false</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i < compaction_picker_-><span class="built_in">NumberLevels</span>() - <span class="number">1</span>; i++) {</span><br><span class="line"> start_level_score_ = vstorage_-><span class="built_in">CompactionScore</span>(i);</span><br><span class="line"> start_level_ = vstorage_-><span class="built_in">CompactionScoreLevel</span>(i);</span><br><span class="line"> <span class="built_in">assert</span>(i == <span class="number">0</span> || start_level_score_ <= vstorage_-><span class="built_in">CompactionScore</span>(i - <span class="number">1</span>));</span><br><span class="line"> <span class="keyword">if</span> (start_level_score_ >= <span class="number">1</span>) {</span><br><span class="line"> <span class="keyword">if</span> (skipped_l0_to_base && start_level_ == vstorage_-><span class="built_in">base_level</span>()) {</span><br><span class="line"> <span class="comment">// If L0->base_level compaction is pending, don't schedule further</span></span><br><span class="line"> <span class="comment">// compaction from base level. Otherwise L0->base_level compaction</span></span><br><span class="line"> <span class="comment">// may starve.</span></span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> }</span><br><span class="line"> output_level_ =</span><br><span class="line"> (start_level_ == <span class="number">0</span>) ? vstorage_-><span class="built_in">base_level</span>() : start_level_ + <span class="number">1</span>;</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">PickFileToCompact</span>()) {</span><br><span class="line"> <span class="comment">// found the compaction!</span></span><br><span class="line"> <span class="keyword">if</span> (start_level_ == <span class="number">0</span>) {</span><br><span class="line"> <span class="comment">// L0 score = `num L0 files` / `level0_file_num_compaction_trigger`</span></span><br><span class="line"> compaction_reason_ = CompactionReason::kLevelL0FilesNum;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">// L1+ score = `Level files size` / `MaxBytesForLevel`</span></span><br><span class="line"> compaction_reason_ = CompactionReason::kLevelMaxLevelSize;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">// didn't find the compaction, clear the inputs</span></span><br><span class="line"> start_level_inputs_.<span class="built_in">clear</span>();</span><br><span class="line"> <span class="keyword">if</span> (start_level_ == <span class="number">0</span>) {</span><br><span class="line"> skipped_l0_to_base = <span class="literal">true</span>;</span><br><span class="line"> <span class="comment">// L0->base_level may be blocked due to ongoing L0->base_level</span></span><br><span class="line"> <span class="comment">// compactions. It may also be blocked by an ongoing compaction from</span></span><br><span class="line"> <span class="comment">// base_level downwards.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// In these cases, to reduce L0 file count and thus reduce likelihood</span></span><br><span class="line"> <span class="comment">// of write stalls, we can attempt compacting a span of files within</span></span><br><span class="line"> <span class="comment">// L0.</span></span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">PickIntraL0Compaction</span>()) {</span><br><span class="line"> output_level_ = <span class="number">0</span>;</span><br><span class="line"> compaction_reason_ = CompactionReason::kLevelL0FilesNum;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">// Compaction scores are sorted in descending order, no further scores</span></span><br><span class="line"> <span class="comment">// will be >= 1.</span></span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (!start_level_inputs_.<span class="built_in">empty</span>()) {</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// if we didn't find a compaction, check if there are any files marked for</span></span><br><span class="line"> <span class="comment">// compaction</span></span><br><span class="line"> parent_index_ = base_index_ = <span class="number">-1</span>;</span><br><span class="line"></span><br><span class="line"> compaction_picker_-><span class="built_in">PickFilesMarkedForCompaction</span>(</span><br><span class="line"> cf_name_, vstorage_, &start_level_, &output_level_, &start_level_inputs_);</span><br><span class="line"> <span class="keyword">if</span> (!start_level_inputs_.<span class="built_in">empty</span>()) {</span><br><span class="line"> compaction_reason_ = CompactionReason::kFilesMarkedForCompaction;</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Bottommost Files Compaction on deleting tombstones</span></span><br><span class="line"> <span class="built_in">PickFileToCompact</span>(vstorage_-><span class="built_in">BottommostFilesMarkedForCompaction</span>(), <span class="literal">false</span>);</span><br><span class="line"> <span class="keyword">if</span> (!start_level_inputs_.<span class="built_in">empty</span>()) {</span><br><span class="line"> compaction_reason_ = CompactionReason::kBottommostFiles;</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// TTL Compaction</span></span><br><span class="line"> <span class="built_in">PickFileToCompact</span>(vstorage_-><span class="built_in">ExpiredTtlFiles</span>(), <span class="literal">true</span>);</span><br><span class="line"> <span class="keyword">if</span> (!start_level_inputs_.<span class="built_in">empty</span>()) {</span><br><span class="line"> compaction_reason_ = CompactionReason::kTtl;</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Periodic Compaction</span></span><br><span class="line"> <span class="built_in">PickFileToCompact</span>(vstorage_-><span class="built_in">FilesMarkedForPeriodicCompaction</span>(), <span class="literal">false</span>);</span><br><span class="line"> <span class="keyword">if</span> (!start_level_inputs_.<span class="built_in">empty</span>()) {</span><br><span class="line"> compaction_reason_ = CompactionReason::kPeriodicCompaction;</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li><p>首先遍历所有的level,从之前计算好的的compaction信息中得到每个level对应的score,只有当score>=1才能继续进行compact的处理。</p></li><li><p>通过<code>PickFileToCompact</code>来选择input以及output文件。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">bool</span> <span class="title">LevelCompactionBuilder::PickFileToCompact</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="comment">// level 0 files are overlapping. So we cannot pick more</span></span><br><span class="line"> <span class="comment">// than one concurrent compactions at this level. This</span></span><br><span class="line"> <span class="comment">// could be made better by looking at key-ranges that are</span></span><br><span class="line"> <span class="comment">// being compacted at level 0.</span></span><br><span class="line"> <span class="keyword">if</span> (start_level_ == <span class="number">0</span> &&</span><br><span class="line"> !compaction_picker_-><span class="built_in">level0_compactions_in_progress</span>()-><span class="built_in">empty</span>()) {</span><br><span class="line"> <span class="built_in">TEST_SYNC_POINT</span>(<span class="string">"LevelCompactionPicker::PickCompactionBySize:0"</span>);</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> start_level_inputs_.<span class="built_in">clear</span>();</span><br><span class="line"></span><br><span class="line"> <span class="built_in">assert</span>(start_level_ >= <span class="number">0</span>);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Pick the largest file in this level that is not already</span></span><br><span class="line"> <span class="comment">// being compacted</span></span><br><span class="line"> <span class="type">const</span> std::vector<<span class="type">int</span>>& file_size =</span><br><span class="line"> vstorage_-><span class="built_in">FilesByCompactionPri</span>(start_level_);</span><br><span class="line"> <span class="type">const</span> std::vector<FileMetaData*>& level_files =</span><br><span class="line"> vstorage_-><span class="built_in">LevelFiles</span>(start_level_);</span><br><span class="line"></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">int</span> cmp_idx;</span><br><span class="line"> <span class="keyword">for</span> (cmp_idx = vstorage_-><span class="built_in">NextCompactionIndex</span>(start_level_);</span><br><span class="line"> cmp_idx < file_size.<span class="built_in">size</span>(); cmp_idx++) {</span><br><span class="line"> <span class="type">int</span> index = file_size[cmp_idx];</span><br><span class="line"> <span class="keyword">auto</span>* f = level_files[index];</span><br><span class="line"></span><br><span class="line"> <span class="comment">// do not pick a file to compact if it is being compacted</span></span><br><span class="line"> <span class="comment">// from n-1 level.</span></span><br><span class="line"> <span class="keyword">if</span> (f->being_compacted) {</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> start_level_inputs_.files.<span class="built_in">push_back</span>(f);</span><br><span class="line"> start_level_inputs_.level = start_level_;</span><br><span class="line"> <span class="keyword">if</span> (!compaction_picker_-><span class="built_in">ExpandInputsToCleanCut</span>(cf_name_, vstorage_,</span><br><span class="line"> &start_level_inputs_) ||</span><br><span class="line"> compaction_picker_-><span class="built_in">FilesRangeOverlapWithCompaction</span>(</span><br><span class="line"> {start_level_inputs_}, output_level_)) {</span><br><span class="line"> <span class="comment">// A locked (pending compaction) input-level file was pulled in due to</span></span><br><span class="line"> <span class="comment">// user-key overlap.</span></span><br><span class="line"> start_level_inputs_.<span class="built_in">clear</span>();</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Now that input level is fully expanded, we check whether any output files</span></span><br><span class="line"> <span class="comment">// are locked due to pending compaction.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// Note we rely on ExpandInputsToCleanCut() to tell us whether any output-</span></span><br><span class="line"> <span class="comment">// level files are locked, not just the extra ones pulled in for user-key</span></span><br><span class="line"> <span class="comment">// overlap.</span></span><br><span class="line"> InternalKey smallest, largest;</span><br><span class="line"> compaction_picker_-><span class="built_in">GetRange</span>(start_level_inputs_, &smallest, &largest);</span><br><span class="line"> CompactionInputFiles output_level_inputs;</span><br><span class="line"> output_level_inputs.level = output_level_;</span><br><span class="line"> vstorage_-><span class="built_in">GetOverlappingInputs</span>(output_level_, &smallest, &largest,</span><br><span class="line"> &output_level_inputs.files);</span><br><span class="line"> <span class="keyword">if</span> (!output_level_inputs.<span class="built_in">empty</span>() &&</span><br><span class="line"> !compaction_picker_-><span class="built_in">ExpandInputsToCleanCut</span>(cf_name_, vstorage_,</span><br><span class="line"> &output_level_inputs)) {</span><br><span class="line"> start_level_inputs_.<span class="built_in">clear</span>();</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> }</span><br><span class="line"> base_index_ = index;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// store where to start the iteration in the next call to PickCompaction</span></span><br><span class="line"> vstorage_-><span class="built_in">SetNextCompactionIndex</span>(start_level_, cmp_idx);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> start_level_inputs_.<span class="built_in">size</span>() > <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li><p>首先得到当前level(start_level_)的未compacted的最大大小的文件。</p></li><li><p>通过cmp_idx索引到对应的文件。</p></li><li><p>通过<code>ExpandInputsToCleanCut</code>扩展当前文件的key的范围,需要满足"<a href="https://github.com/facebook/rocksdb/wiki/Choose-Level-Compaction-Files">clean cut</a>"。</p></li><li><p>通过<code>FilesRangeOverlapWithCompaction</code>判断是否有正在compact的out_level的文件范围和已经选择好的文件的key有overlap,如果有则跳过(clear start_level_inputs然后continue)。</p></li><li><p>最后在output_level中选择和start_level已经选择的文件的key有overlap的文件,通过<code>ExpandInputsToCleanCut</code>来判断output level files是否有被lock的,如果有则跳过(clear start_level_inputs然后continue)。</p></li></ul></li></ul><p>继续分析<code>PickCompaction</code>,在RocksDB中level-0比较特殊,因为只有level-0中的sst文件之间是无序的,因此接下来我们需要特殊处理level-0的情况,这个函数就是<code>SetupOtherL0FilesIfNeeded</code>。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">bool</span> <span class="title">LevelCompactionBuilder::SetupOtherL0FilesIfNeeded</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">if</span> (start_level_ == <span class="number">0</span> && output_level_ != <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">return</span> compaction_picker_-><span class="built_in">GetOverlappingL0Files</span>(</span><br><span class="line"> vstorage_, &start_level_inputs_, output_level_, &parent_index_);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>如果调用start_level_ == 0 且 output_level_ != 0则调用<code>GetOverlappingL0Files</code>。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">bool</span> <span class="title">CompactionPicker::GetOverlappingL0Files</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> VersionStorageInfo* vstorage, CompactionInputFiles* start_level_inputs,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">int</span> output_level, <span class="type">int</span>* parent_index)</span> </span>{</span><br><span class="line"> <span class="comment">// Two level 0 compaction won't run at the same time, so don't need to worry</span></span><br><span class="line"> <span class="comment">// about files on level 0 being compacted.</span></span><br><span class="line"> <span class="built_in">assert</span>(<span class="built_in">level0_compactions_in_progress</span>()-><span class="built_in">empty</span>());</span><br><span class="line"> InternalKey smallest, largest;</span><br><span class="line"> <span class="built_in">GetRange</span>(*start_level_inputs, &smallest, &largest);</span><br><span class="line"> <span class="comment">// Note that the next call will discard the file we placed in</span></span><br><span class="line"> <span class="comment">// c->inputs_[0] earlier and replace it with an overlapping set</span></span><br><span class="line"> <span class="comment">// which will include the picked file.</span></span><br><span class="line"> start_level_inputs->files.<span class="built_in">clear</span>();</span><br><span class="line"> vstorage-><span class="built_in">GetOverlappingInputs</span>(<span class="number">0</span>, &smallest, &largest,</span><br><span class="line"> &(start_level_inputs->files));</span><br><span class="line"></span><br><span class="line"> <span class="comment">// If we include more L0 files in the same compaction run it can</span></span><br><span class="line"> <span class="comment">// cause the 'smallest' and 'largest' key to get extended to a</span></span><br><span class="line"> <span class="comment">// larger range. So, re-invoke GetRange to get the new key range</span></span><br><span class="line"> <span class="built_in">GetRange</span>(*start_level_inputs, &smallest, &largest);</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">IsRangeInCompaction</span>(vstorage, &smallest, &largest, output_level,</span><br><span class="line"> parent_index)) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">assert</span>(!start_level_inputs->files.<span class="built_in">empty</span>());</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li>从level-0中得到所有的重合key的文件,然后加入到start_level_inputs中。</li></ul><p>最后调用<code>SetupOtherInputsIfNeeded()</code>。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">bool</span> <span class="title">LevelCompactionBuilder::SetupOtherInputsIfNeeded</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="comment">// Setup input files from output level. For output to L0, we only compact</span></span><br><span class="line"> <span class="comment">// spans of files that do not interact with any pending compactions, so don't</span></span><br><span class="line"> <span class="comment">// need to consider other levels.</span></span><br><span class="line"> <span class="keyword">if</span> (output_level_ != <span class="number">0</span>) {</span><br><span class="line"> output_level_inputs_.level = output_level_;</span><br><span class="line"> <span class="keyword">if</span> (!compaction_picker_-><span class="built_in">SetupOtherInputs</span>(</span><br><span class="line"> cf_name_, mutable_cf_options_, vstorage_, &start_level_inputs_,</span><br><span class="line"> &output_level_inputs_, &parent_index_, base_index_)) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> compaction_inputs_.<span class="built_in">push_back</span>(start_level_inputs_);</span><br><span class="line"> <span class="keyword">if</span> (!output_level_inputs_.<span class="built_in">empty</span>()) {</span><br><span class="line"> compaction_inputs_.<span class="built_in">push_back</span>(output_level_inputs_);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// In some edge cases we could pick a compaction that will be compacting</span></span><br><span class="line"> <span class="comment">// a key range that overlap with another running compaction, and both</span></span><br><span class="line"> <span class="comment">// of them have the same output level. This could happen if</span></span><br><span class="line"> <span class="comment">// (1) we are running a non-exclusive manual compaction</span></span><br><span class="line"> <span class="comment">// (2) AddFile ingest a new file into the LSM tree</span></span><br><span class="line"> <span class="comment">// We need to disallow this from happening.</span></span><br><span class="line"> <span class="keyword">if</span> (compaction_picker_-><span class="built_in">FilesRangeOverlapWithCompaction</span>(compaction_inputs_,</span><br><span class="line"> output_level_)) {</span><br><span class="line"> <span class="comment">// This compaction output could potentially conflict with the output</span></span><br><span class="line"> <span class="comment">// of a currently running compaction, we cannot run it.</span></span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line"> }</span><br><span class="line"> compaction_picker_-><span class="built_in">GetGrandparents</span>(vstorage_, start_level_inputs_,</span><br><span class="line"> output_level_inputs_, &grandparents_);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> compaction_inputs_.<span class="built_in">push_back</span>(start_level_inputs_);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li><p>调用<code>SetupOtherInputs</code>,扩展start_level_inputs对应的output。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Populates the set of inputs of all other levels that overlap with the</span></span><br><span class="line"><span class="comment">// start level.</span></span><br><span class="line"><span class="comment">// Now we assume all levels except start level and output level are empty.</span></span><br><span class="line"><span class="comment">// Will also attempt to expand "start level" if that doesn't expand</span></span><br><span class="line"><span class="comment">// "output level" or cause "level" to include a file for compaction that has an</span></span><br><span class="line"><span class="comment">// overlapping user-key with another file.</span></span><br><span class="line"><span class="comment">// REQUIRES: input_level and output_level are different</span></span><br><span class="line"><span class="comment">// REQUIRES: inputs->empty() == false</span></span><br><span class="line"><span class="comment">// Returns false if files on parent level are currently in compaction, which</span></span><br><span class="line"><span class="comment">// means that we can't compact them</span></span><br><span class="line"><span class="function"><span class="type">bool</span> <span class="title">CompactionPicker::SetupOtherInputs</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">const</span> std::string& cf_name, <span class="type">const</span> MutableCFOptions& mutable_cf_options,</span></span></span><br><span class="line"><span class="params"><span class="function"> VersionStorageInfo* vstorage, CompactionInputFiles* inputs,</span></span></span><br><span class="line"><span class="params"><span class="function"> CompactionInputFiles* output_level_inputs, <span class="type">int</span>* parent_index,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">int</span> base_index)</span> </span>{</span><br><span class="line"> <span class="built_in">assert</span>(!inputs-><span class="built_in">empty</span>());</span><br><span class="line"> <span class="built_in">assert</span>(output_level_inputs-><span class="built_in">empty</span>());</span><br><span class="line"> <span class="type">const</span> <span class="type">int</span> input_level = inputs->level;</span><br><span class="line"> <span class="type">const</span> <span class="type">int</span> output_level = output_level_inputs->level;</span><br><span class="line"> <span class="keyword">if</span> (input_level == output_level) {</span><br><span class="line"> <span class="comment">// no possibility of conflict</span></span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// For now, we only support merging two levels, start level and output level.</span></span><br><span class="line"> <span class="comment">// We need to assert other levels are empty.</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> l = input_level + <span class="number">1</span>; l < output_level; l++) {</span><br><span class="line"> <span class="built_in">assert</span>(vstorage-><span class="built_in">NumLevelFiles</span>(l) == <span class="number">0</span>);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> InternalKey smallest, largest;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Get the range one last time.</span></span><br><span class="line"> <span class="built_in">GetRange</span>(*inputs, &smallest, &largest);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Populate the set of next-level files (inputs_GetOutputLevelInputs()) to</span></span><br><span class="line"> <span class="comment">// include in compaction</span></span><br><span class="line"> vstorage-><span class="built_in">GetOverlappingInputs</span>(output_level, &smallest, &largest,</span><br><span class="line"> &output_level_inputs->files, *parent_index,</span><br><span class="line"> parent_index);</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">AreFilesInCompaction</span>(output_level_inputs->files)) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (!output_level_inputs-><span class="built_in">empty</span>()) {</span><br><span class="line"> <span class="keyword">if</span> (!<span class="built_in">ExpandInputsToCleanCut</span>(cf_name, vstorage, output_level_inputs)) {</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// See if we can further grow the number of inputs in "level" without</span></span><br><span class="line"> <span class="comment">// changing the number of "level+1" files we pick up. We also choose NOT</span></span><br><span class="line"> <span class="comment">// to expand if this would cause "level" to include some entries for some</span></span><br><span class="line"> <span class="comment">// user key, while excluding other entries for the same user key. This</span></span><br><span class="line"> <span class="comment">// can happen when one user key spans multiple files.</span></span><br><span class="line"> <span class="keyword">if</span> (!output_level_inputs-><span class="built_in">empty</span>()) {</span><br><span class="line"> <span class="type">const</span> <span class="type">uint64_t</span> limit = mutable_cf_options.max_compaction_bytes;</span><br><span class="line"> <span class="type">const</span> <span class="type">uint64_t</span> output_level_inputs_size =</span><br><span class="line"> <span class="built_in">TotalCompensatedFileSize</span>(output_level_inputs->files);</span><br><span class="line"> <span class="type">const</span> <span class="type">uint64_t</span> inputs_size = <span class="built_in">TotalCompensatedFileSize</span>(inputs->files);</span><br><span class="line"> <span class="type">bool</span> expand_inputs = <span class="literal">false</span>;</span><br><span class="line"></span><br><span class="line"> CompactionInputFiles expanded_inputs;</span><br><span class="line"> expanded_inputs.level = input_level;</span><br><span class="line"> <span class="comment">// Get closed interval of output level</span></span><br><span class="line"> InternalKey all_start, all_limit;</span><br><span class="line"> <span class="built_in">GetRange</span>(*inputs, *output_level_inputs, &all_start, &all_limit);</span><br><span class="line"> <span class="type">bool</span> try_overlapping_inputs = <span class="literal">true</span>;</span><br><span class="line"> vstorage-><span class="built_in">GetOverlappingInputs</span>(input_level, &all_start, &all_limit,</span><br><span class="line"> &expanded_inputs.files, base_index, <span class="literal">nullptr</span>);</span><br><span class="line"> <span class="type">uint64_t</span> expanded_inputs_size =</span><br><span class="line"> <span class="built_in">TotalCompensatedFileSize</span>(expanded_inputs.files);</span><br><span class="line"> <span class="keyword">if</span> (!<span class="built_in">ExpandInputsToCleanCut</span>(cf_name, vstorage, &expanded_inputs)) {</span><br><span class="line"> try_overlapping_inputs = <span class="literal">false</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (try_overlapping_inputs && expanded_inputs.<span class="built_in">size</span>() > inputs-><span class="built_in">size</span>() &&</span><br><span class="line"> output_level_inputs_size + expanded_inputs_size < limit &&</span><br><span class="line"> !<span class="built_in">AreFilesInCompaction</span>(expanded_inputs.files)) {</span><br><span class="line"> InternalKey new_start, new_limit;</span><br><span class="line"> <span class="built_in">GetRange</span>(expanded_inputs, &new_start, &new_limit);</span><br><span class="line"> CompactionInputFiles expanded_output_level_inputs;</span><br><span class="line"> expanded_output_level_inputs.level = output_level;</span><br><span class="line"> vstorage-><span class="built_in">GetOverlappingInputs</span>(output_level, &new_start, &new_limit,</span><br><span class="line"> &expanded_output_level_inputs.files,</span><br><span class="line"> *parent_index, parent_index);</span><br><span class="line"> <span class="built_in">assert</span>(!expanded_output_level_inputs.<span class="built_in">empty</span>());</span><br><span class="line"> <span class="keyword">if</span> (!<span class="built_in">AreFilesInCompaction</span>(expanded_output_level_inputs.files) &&</span><br><span class="line"> <span class="built_in">ExpandInputsToCleanCut</span>(cf_name, vstorage,</span><br><span class="line"> &expanded_output_level_inputs) &&</span><br><span class="line"> expanded_output_level_inputs.<span class="built_in">size</span>() == output_level_inputs-><span class="built_in">size</span>()) {</span><br><span class="line"> expand_inputs = <span class="literal">true</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (!expand_inputs) {</span><br><span class="line"> vstorage-><span class="built_in">GetCleanInputsWithinInterval</span>(input_level, &all_start,</span><br><span class="line"> &all_limit, &expanded_inputs.files,</span><br><span class="line"> base_index, <span class="literal">nullptr</span>);</span><br><span class="line"> expanded_inputs_size = <span class="built_in">TotalCompensatedFileSize</span>(expanded_inputs.files);</span><br><span class="line"> <span class="keyword">if</span> (expanded_inputs.<span class="built_in">size</span>() > inputs-><span class="built_in">size</span>() &&</span><br><span class="line"> output_level_inputs_size + expanded_inputs_size < limit &&</span><br><span class="line"> !<span class="built_in">AreFilesInCompaction</span>(expanded_inputs.files)) {</span><br><span class="line"> expand_inputs = <span class="literal">true</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (expand_inputs) {</span><br><span class="line"> <span class="built_in">ROCKS_LOG_INFO</span>(ioptions_.logger,</span><br><span class="line"> <span class="string">"[%s] Expanding@%d %"</span> ROCKSDB_PRIszt <span class="string">"+%"</span> ROCKSDB_PRIszt</span><br><span class="line"> <span class="string">"(%"</span> PRIu64 <span class="string">"+%"</span> PRIu64 <span class="string">" bytes) to %"</span> ROCKSDB_PRIszt</span><br><span class="line"> <span class="string">"+%"</span> ROCKSDB_PRIszt <span class="string">" (%"</span> PRIu64 <span class="string">"+%"</span> PRIu64 <span class="string">" bytes)\n"</span>,</span><br><span class="line"> cf_name.<span class="built_in">c_str</span>(), input_level, inputs-><span class="built_in">size</span>(),</span><br><span class="line"> output_level_inputs-><span class="built_in">size</span>(), inputs_size,</span><br><span class="line"> output_level_inputs_size, expanded_inputs.<span class="built_in">size</span>(),</span><br><span class="line"> output_level_inputs-><span class="built_in">size</span>(), expanded_inputs_size,</span><br><span class="line"> output_level_inputs_size);</span><br><span class="line"> inputs->files = expanded_inputs.files;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure></li><li><p>将start_level_inputs和output_level_inputs加入到compaction_inputs中。</p></li><li><p>防止一些可能会出现的conflict情况,进行一些判断。</p></li></ul><p>回到<code>PickCompaction</code>函数,最后构造一个compaction然后返回。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Form a compaction object containing the files we picked.</span></span><br><span class="line">Compaction* c = <span class="built_in">GetCompaction</span>();</span><br><span class="line"><span class="built_in">TEST_SYNC_POINT_CALLBACK</span>(<span class="string">"LevelCompactionPicker::PickCompaction:Return"</span>, c);</span><br><span class="line"><span class="keyword">return</span> c;</span><br></pre></td></tr></table></figure><h3 id="compaction-job根据获取到数据分配compaction线程"><a class="markdownIt-Anchor" href="#compaction-job根据获取到数据分配compaction线程"></a> Compaction job:根据获取到数据分配compaction线程</h3><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">TEST_SYNC_POINT_CALLBACK</span>(<span class="string">"DBImpl::BackgroundCompaction:BeforeCompaction"</span>,</span><br><span class="line"> c-><span class="built_in">column_family_data</span>());</span><br><span class="line"> <span class="type">int</span> output_level __attribute__((__unused__));</span><br><span class="line"> output_level = c-><span class="built_in">output_level</span>();</span><br><span class="line"> <span class="built_in">TEST_SYNC_POINT_CALLBACK</span>(<span class="string">"DBImpl::BackgroundCompaction:NonTrivial"</span>,</span><br><span class="line"> &output_level);</span><br><span class="line"> std::vector<SequenceNumber> snapshot_seqs;</span><br><span class="line"> SequenceNumber earliest_write_conflict_snapshot;</span><br><span class="line"> SnapshotChecker* snapshot_checker;</span><br><span class="line"> <span class="built_in">GetSnapshotContext</span>(job_context, &snapshot_seqs,</span><br><span class="line"> &earliest_write_conflict_snapshot, &snapshot_checker);</span><br><span class="line"> <span class="built_in">assert</span>(is_snapshot_supported_ || snapshots_.<span class="built_in">empty</span>());</span><br><span class="line"> <span class="function">CompactionJob <span class="title">compaction_job</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> job_context->job_id, c.get(), immutable_db_options_,</span></span></span><br><span class="line"><span class="params"><span class="function"> mutable_db_options_, file_options_for_compaction_, versions_.get(),</span></span></span><br><span class="line"><span class="params"><span class="function"> &shutting_down_, preserve_deletes_seqnum_.load(), log_buffer,</span></span></span><br><span class="line"><span class="params"><span class="function"> directories_.GetDbDir(),</span></span></span><br><span class="line"><span class="params"><span class="function"> GetDataDir(c->column_family_data(), c->output_path_id()),</span></span></span><br><span class="line"><span class="params"><span class="function"> GetDataDir(c->column_family_data(), <span class="number">0</span>), stats_, &mutex_,</span></span></span><br><span class="line"><span class="params"><span class="function"> &error_handler_, snapshot_seqs, earliest_write_conflict_snapshot,</span></span></span><br><span class="line"><span class="params"><span class="function"> snapshot_checker, table_cache_, &event_logger_,</span></span></span><br><span class="line"><span class="params"><span class="function"> c->mutable_cf_options()->paranoid_file_checks,</span></span></span><br><span class="line"><span class="params"><span class="function"> c->mutable_cf_options()->report_bg_io_stats, dbname_,</span></span></span><br><span class="line"><span class="params"><span class="function"> &compaction_job_stats, thread_pri, io_tracer_,</span></span></span><br><span class="line"><span class="params"><span class="function"> is_manual ? &manual_compaction_paused_ : <span class="literal">nullptr</span>,</span></span></span><br><span class="line"><span class="params"><span class="function"> is_manual ? manual_compaction->canceled : <span class="literal">nullptr</span>, db_id_,</span></span></span><br><span class="line"><span class="params"><span class="function"> db_session_id_, c->column_family_data()->GetFullHistoryTsLow(),</span></span></span><br><span class="line"><span class="params"><span class="function"> &blob_callback_)</span></span>;</span><br><span class="line"> compaction_job.<span class="built_in">Prepare</span>();</span><br><span class="line"></span><br><span class="line"> <span class="built_in">NotifyOnCompactionBegin</span>(c-><span class="built_in">column_family_data</span>(), c.<span class="built_in">get</span>(), status,</span><br><span class="line"> compaction_job_stats, job_context->job_id);</span><br><span class="line"> mutex_.<span class="built_in">Unlock</span>();</span><br><span class="line"> <span class="built_in">TEST_SYNC_POINT_CALLBACK</span>(</span><br><span class="line"> <span class="string">"DBImpl::BackgroundCompaction:NonTrivial:BeforeRun"</span>, <span class="literal">nullptr</span>);</span><br><span class="line"> <span class="comment">// Should handle erorr?</span></span><br><span class="line"> compaction_job.<span class="built_in">Run</span>().<span class="built_in">PermitUncheckedError</span>();</span><br><span class="line"> <span class="built_in">TEST_SYNC_POINT</span>(<span class="string">"DBImpl::BackgroundCompaction:NonTrivial:AfterRun"</span>);</span><br><span class="line"> mutex_.<span class="built_in">Lock</span>();</span><br><span class="line"></span><br><span class="line"> status = compaction_job.<span class="built_in">Install</span>(*c-><span class="built_in">mutable_cf_options</span>());</span><br><span class="line"> io_s = compaction_job.<span class="built_in">io_status</span>();</span><br><span class="line"> <span class="keyword">if</span> (status.<span class="built_in">ok</span>()) {</span><br><span class="line"> <span class="built_in">InstallSuperVersionAndScheduleWork</span>(c-><span class="built_in">column_family_data</span>(),</span><br><span class="line"> &job_context->superversion_contexts[<span class="number">0</span>],</span><br><span class="line"> *c-><span class="built_in">mutable_cf_options</span>());</span><br><span class="line"> }</span><br><span class="line"> *made_progress = <span class="literal">true</span>;</span><br><span class="line"> <span class="built_in">TEST_SYNC_POINT_CALLBACK</span>(<span class="string">"DBImpl::BackgroundCompaction:AfterCompaction"</span>,</span><br><span class="line"> c-><span class="built_in">column_family_data</span>());</span><br></pre></td></tr></table></figure><ul><li><p><code>Prepare</code></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">CompactionJob::Prepare</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="function">AutoThreadOperationStageUpdater <span class="title">stage_updater</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> ThreadStatus::STAGE_COMPACTION_PREPARE)</span></span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Generate file_levels_ for compaction before making Iterator</span></span><br><span class="line"> <span class="keyword">auto</span>* c = compact_->compaction;</span><br><span class="line"> <span class="built_in">assert</span>(c-><span class="built_in">column_family_data</span>() != <span class="literal">nullptr</span>);</span><br><span class="line"> <span class="built_in">assert</span>(c-><span class="built_in">column_family_data</span>()-><span class="built_in">current</span>()-><span class="built_in">storage_info</span>()-><span class="built_in">NumLevelFiles</span>(</span><br><span class="line"> compact_->compaction-><span class="built_in">level</span>()) > <span class="number">0</span>);</span><br><span class="line"></span><br><span class="line"> write_hint_ =</span><br><span class="line"> c-><span class="built_in">column_family_data</span>()-><span class="built_in">CalculateSSTWriteHint</span>(c-><span class="built_in">output_level</span>());</span><br><span class="line"> bottommost_level_ = c-><span class="built_in">bottommost_level</span>();</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (c-><span class="built_in">ShouldFormSubcompactions</span>()) {</span><br><span class="line"> {</span><br><span class="line"> <span class="function">StopWatch <span class="title">sw</span><span class="params">(db_options_.clock, stats_, SUBCOMPACTION_SETUP_TIME)</span></span>;</span><br><span class="line"> <span class="built_in">GenSubcompactionBoundaries</span>();</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">assert</span>(sizes_.<span class="built_in">size</span>() == boundaries_.<span class="built_in">size</span>() + <span class="number">1</span>);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">size_t</span> i = <span class="number">0</span>; i <= boundaries_.<span class="built_in">size</span>(); i++) {</span><br><span class="line"> Slice* start = i == <span class="number">0</span> ? <span class="literal">nullptr</span> : &boundaries_[i - <span class="number">1</span>];</span><br><span class="line"> Slice* end = i == boundaries_.<span class="built_in">size</span>() ? <span class="literal">nullptr</span> : &boundaries_[i];</span><br><span class="line"> compact_->sub_compact_states.<span class="built_in">emplace_back</span>(c, start, end, sizes_[i],</span><br><span class="line"> <span class="built_in">static_cast</span><<span class="type">uint32_t</span>>(i));</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">RecordInHistogram</span>(stats_, NUM_SUBCOMPACTIONS_SCHEDULED,</span><br><span class="line"> compact_->sub_compact_states.<span class="built_in">size</span>());</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">constexpr</span> Slice* start = <span class="literal">nullptr</span>;</span><br><span class="line"> <span class="keyword">constexpr</span> Slice* end = <span class="literal">nullptr</span>;</span><br><span class="line"> <span class="keyword">constexpr</span> <span class="type">uint64_t</span> size = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"> compact_->sub_compact_states.<span class="built_in">emplace_back</span>(c, start, end, size,</span><br><span class="line"> <span class="comment">/*sub_job_id*/</span> <span class="number">0</span>);</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li><p>调用<code>GenSubcompactionBoundaries</code>构造subcompaction。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">CompactionJob::GenSubcompactionBoundaries</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">auto</span>* c = compact_->compaction;</span><br><span class="line"> <span class="keyword">auto</span>* cfd = c-><span class="built_in">column_family_data</span>();</span><br><span class="line"> <span class="type">const</span> Comparator* cfd_comparator = cfd-><span class="built_in">user_comparator</span>();</span><br><span class="line"> std::vector<Slice> bounds;</span><br><span class="line"> <span class="type">int</span> start_lvl = c-><span class="built_in">start_level</span>();</span><br><span class="line"> <span class="type">int</span> out_lvl = c-><span class="built_in">output_level</span>();</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Add the starting and/or ending key of certain input files as a potential</span></span><br><span class="line"> <span class="comment">// boundary</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">size_t</span> lvl_idx = <span class="number">0</span>; lvl_idx < c-><span class="built_in">num_input_levels</span>(); lvl_idx++) {</span><br><span class="line"> <span class="type">int</span> lvl = c-><span class="built_in">level</span>(lvl_idx);</span><br><span class="line"> <span class="keyword">if</span> (lvl >= start_lvl && lvl <= out_lvl) {</span><br><span class="line"> <span class="type">const</span> LevelFilesBrief* flevel = c-><span class="built_in">input_levels</span>(lvl_idx);</span><br><span class="line"> <span class="type">size_t</span> num_files = flevel->num_files;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (num_files == <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (lvl == <span class="number">0</span>) {</span><br><span class="line"> <span class="comment">// For level 0 add the starting and ending key of each file since the</span></span><br><span class="line"> <span class="comment">// files may have greatly differing key ranges (not range-partitioned)</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">size_t</span> i = <span class="number">0</span>; i < num_files; i++) {</span><br><span class="line"> bounds.<span class="built_in">emplace_back</span>(flevel->files[i].smallest_key);</span><br><span class="line"> bounds.<span class="built_in">emplace_back</span>(flevel->files[i].largest_key);</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">// For all other levels add the smallest/largest key in the level to</span></span><br><span class="line"> <span class="comment">// encompass the range covered by that level</span></span><br><span class="line"> bounds.<span class="built_in">emplace_back</span>(flevel->files[<span class="number">0</span>].smallest_key);</span><br><span class="line"> bounds.<span class="built_in">emplace_back</span>(flevel->files[num_files - <span class="number">1</span>].largest_key);</span><br><span class="line"> <span class="keyword">if</span> (lvl == out_lvl) {</span><br><span class="line"> <span class="comment">// For the last level include the starting keys of all files since</span></span><br><span class="line"> <span class="comment">// the last level is the largest and probably has the widest key</span></span><br><span class="line"> <span class="comment">// range. Since it's range partitioned, the ending key of one file</span></span><br><span class="line"> <span class="comment">// and the starting key of the next are very close (or identical).</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">size_t</span> i = <span class="number">1</span>; i < num_files; i++) {</span><br><span class="line"> bounds.<span class="built_in">emplace_back</span>(flevel->files[i].smallest_key);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> std::<span class="built_in">sort</span>(bounds.<span class="built_in">begin</span>(), bounds.<span class="built_in">end</span>(),</span><br><span class="line"> [cfd_comparator](<span class="type">const</span> Slice& a, <span class="type">const</span> Slice& b) -> <span class="type">bool</span> {</span><br><span class="line"> <span class="keyword">return</span> cfd_comparator-><span class="built_in">Compare</span>(<span class="built_in">ExtractUserKey</span>(a),</span><br><span class="line"> <span class="built_in">ExtractUserKey</span>(b)) < <span class="number">0</span>;</span><br><span class="line"> });</span><br><span class="line"> <span class="comment">// Remove duplicated entries from bounds</span></span><br><span class="line"> bounds.<span class="built_in">erase</span>(</span><br><span class="line"> std::<span class="built_in">unique</span>(bounds.<span class="built_in">begin</span>(), bounds.<span class="built_in">end</span>(),</span><br><span class="line"> [cfd_comparator](<span class="type">const</span> Slice& a, <span class="type">const</span> Slice& b) -> <span class="type">bool</span> {</span><br><span class="line"> <span class="keyword">return</span> cfd_comparator-><span class="built_in">Compare</span>(<span class="built_in">ExtractUserKey</span>(a),</span><br><span class="line"> <span class="built_in">ExtractUserKey</span>(b)) == <span class="number">0</span>;</span><br><span class="line"> }),</span><br><span class="line"> bounds.<span class="built_in">end</span>());</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Combine consecutive pairs of boundaries into ranges with an approximate</span></span><br><span class="line"> <span class="comment">// size of data covered by keys in that range</span></span><br><span class="line"> <span class="type">uint64_t</span> sum = <span class="number">0</span>;</span><br><span class="line"> std::vector<RangeWithSize> ranges;</span><br><span class="line"> <span class="comment">// Get input version from CompactionState since it's already referenced</span></span><br><span class="line"> <span class="comment">// earlier in SetInputVersioCompaction::SetInputVersion and will not change</span></span><br><span class="line"> <span class="comment">// when db_mutex_ is released below</span></span><br><span class="line"> <span class="keyword">auto</span>* v = compact_->compaction-><span class="built_in">input_version</span>();</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span> it = bounds.<span class="built_in">begin</span>();;) {</span><br><span class="line"> <span class="type">const</span> Slice a = *it;</span><br><span class="line"> ++it;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (it == bounds.<span class="built_in">end</span>()) {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="type">const</span> Slice b = *it;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// ApproximateSize could potentially create table reader iterator to seek</span></span><br><span class="line"> <span class="comment">// to the index block and may incur I/O cost in the process. Unlock db</span></span><br><span class="line"> <span class="comment">// mutex to reduce contention</span></span><br><span class="line"> db_mutex_-><span class="built_in">Unlock</span>();</span><br><span class="line"> <span class="type">uint64_t</span> size = versions_-><span class="built_in">ApproximateSize</span>(<span class="built_in">SizeApproximationOptions</span>(), v, a,</span><br><span class="line"> b, start_lvl, out_lvl + <span class="number">1</span>,</span><br><span class="line"> TableReaderCaller::kCompaction);</span><br><span class="line"> db_mutex_-><span class="built_in">Lock</span>();</span><br><span class="line"> ranges.<span class="built_in">emplace_back</span>(a, b, size);</span><br><span class="line"> sum += size;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Group the ranges into subcompactions</span></span><br><span class="line"> <span class="type">const</span> <span class="type">double</span> min_file_fill_percent = <span class="number">4.0</span> / <span class="number">5</span>;</span><br><span class="line"> <span class="type">int</span> base_level = v-><span class="built_in">storage_info</span>()-><span class="built_in">base_level</span>();</span><br><span class="line"> <span class="type">uint64_t</span> max_output_files = <span class="built_in">static_cast</span><<span class="type">uint64_t</span>>(std::<span class="built_in">ceil</span>(</span><br><span class="line"> sum / min_file_fill_percent /</span><br><span class="line"> <span class="built_in">MaxFileSizeForLevel</span>(</span><br><span class="line"> *(c-><span class="built_in">mutable_cf_options</span>()), out_lvl,</span><br><span class="line"> c-><span class="built_in">immutable_options</span>()->compaction_style, base_level,</span><br><span class="line"> c-><span class="built_in">immutable_options</span>()->level_compaction_dynamic_level_bytes)));</span><br><span class="line"> <span class="type">uint64_t</span> subcompactions =</span><br><span class="line"> std::<span class="built_in">min</span>({<span class="built_in">static_cast</span><<span class="type">uint64_t</span>>(ranges.<span class="built_in">size</span>()),</span><br><span class="line"> <span class="built_in">static_cast</span><<span class="type">uint64_t</span>>(c-><span class="built_in">max_subcompactions</span>()),</span><br><span class="line"> max_output_files});</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (subcompactions > <span class="number">1</span>) {</span><br><span class="line"> <span class="type">double</span> mean = sum * <span class="number">1.0</span> / subcompactions;</span><br><span class="line"> <span class="comment">// Greedily add ranges to the subcompaction until the sum of the ranges'</span></span><br><span class="line"> <span class="comment">// sizes becomes >= the expected mean size of a subcompaction</span></span><br><span class="line"> sum = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">size_t</span> i = <span class="number">0</span>; i + <span class="number">1</span> < ranges.<span class="built_in">size</span>(); i++) {</span><br><span class="line"> sum += ranges[i].size;</span><br><span class="line"> <span class="keyword">if</span> (subcompactions == <span class="number">1</span>) {</span><br><span class="line"> <span class="comment">// If there's only one left to schedule then it goes to the end so no</span></span><br><span class="line"> <span class="comment">// need to put an end boundary</span></span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (sum >= mean) {</span><br><span class="line"> boundaries_.<span class="built_in">emplace_back</span>(<span class="built_in">ExtractUserKey</span>(ranges[i].range.limit));</span><br><span class="line"> sizes_.<span class="built_in">emplace_back</span>(sum);</span><br><span class="line"> subcompactions--;</span><br><span class="line"> sum = <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> sizes_.<span class="built_in">emplace_back</span>(sum + ranges.<span class="built_in">back</span>().size);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">// Only one range so its size is the total sum of sizes computed above</span></span><br><span class="line"> sizes_.<span class="built_in">emplace_back</span>(sum);</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li>遍历所有的需要compact的level,然后取得每一个level的边界(最大key和最小key)加入到bounds数组之中。</li><li>然后对获取到的bounds进行排序去重。</li><li>计算理想情况下所需要的subcompactions的个数以及输出文件的个数。</li><li>最后更新<code>boundaries_</code>,这里会根据文件的大小,通过平均的size,把所有的range分为几份,最终这些都会保存在<code>boundaries_</code>中。</li></ul></li></ul></li><li><p><code>Run</code></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br><span class="line">173</span><br><span class="line">174</span><br><span class="line">175</span><br><span class="line">176</span><br><span class="line">177</span><br><span class="line">178</span><br><span class="line">179</span><br><span class="line">180</span><br><span class="line">181</span><br><span class="line">182</span><br><span class="line">183</span><br><span class="line">184</span><br><span class="line">185</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">Status <span class="title">CompactionJob::Run</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="function">AutoThreadOperationStageUpdater <span class="title">stage_updater</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> ThreadStatus::STAGE_COMPACTION_RUN)</span></span>;</span><br><span class="line"> <span class="built_in">TEST_SYNC_POINT</span>(<span class="string">"CompactionJob::Run():Start"</span>);</span><br><span class="line"> log_buffer_-><span class="built_in">FlushBufferToLog</span>();</span><br><span class="line"> <span class="built_in">LogCompaction</span>();</span><br><span class="line"></span><br><span class="line"> <span class="type">const</span> <span class="type">size_t</span> num_threads = compact_->sub_compact_states.<span class="built_in">size</span>();</span><br><span class="line"> <span class="built_in">assert</span>(num_threads > <span class="number">0</span>);</span><br><span class="line"> <span class="type">const</span> <span class="type">uint64_t</span> start_micros = db_options_.clock-><span class="built_in">NowMicros</span>();</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Launch a thread for each of subcompactions 1...num_threads-1</span></span><br><span class="line"> std::vector<port::Thread> thread_pool;</span><br><span class="line"> thread_pool.<span class="built_in">reserve</span>(num_threads - <span class="number">1</span>);</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">size_t</span> i = <span class="number">1</span>; i < compact_->sub_compact_states.<span class="built_in">size</span>(); i++) {</span><br><span class="line"> thread_pool.<span class="built_in">emplace_back</span>(&CompactionJob::ProcessKeyValueCompaction, <span class="keyword">this</span>,</span><br><span class="line"> &compact_->sub_compact_states[i]);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Always schedule the first subcompaction (whether or not there are also</span></span><br><span class="line"> <span class="comment">// others) in the current thread to be efficient with resources</span></span><br><span class="line"> <span class="built_in">ProcessKeyValueCompaction</span>(&compact_->sub_compact_states[<span class="number">0</span>]);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Wait for all other threads (if there are any) to finish execution</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span>& thread : thread_pool) {</span><br><span class="line"> thread.<span class="built_in">join</span>();</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> compaction_stats_.micros = db_options_.clock-><span class="built_in">NowMicros</span>() - start_micros;</span><br><span class="line"> compaction_stats_.cpu_micros = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">size_t</span> i = <span class="number">0</span>; i < compact_->sub_compact_states.<span class="built_in">size</span>(); i++) {</span><br><span class="line"> compaction_stats_.cpu_micros +=</span><br><span class="line"> compact_->sub_compact_states[i].compaction_job_stats.cpu_micros;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="built_in">RecordTimeToHistogram</span>(stats_, COMPACTION_TIME, compaction_stats_.micros);</span><br><span class="line"> <span class="built_in">RecordTimeToHistogram</span>(stats_, COMPACTION_CPU_TIME,</span><br><span class="line"> compaction_stats_.cpu_micros);</span><br><span class="line"></span><br><span class="line"> <span class="built_in">TEST_SYNC_POINT</span>(<span class="string">"CompactionJob::Run:BeforeVerify"</span>);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Check if any thread encountered an error during execution</span></span><br><span class="line"> Status status;</span><br><span class="line"> IOStatus io_s;</span><br><span class="line"> <span class="type">bool</span> wrote_new_blob_files = <span class="literal">false</span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">const</span> <span class="keyword">auto</span>& state : compact_->sub_compact_states) {</span><br><span class="line"> <span class="keyword">if</span> (!state.status.<span class="built_in">ok</span>()) {</span><br><span class="line"> status = state.status;</span><br><span class="line"> io_s = state.io_status;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (!state.blob_file_additions.<span class="built_in">empty</span>()) {</span><br><span class="line"> wrote_new_blob_files = <span class="literal">true</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (io_status_.<span class="built_in">ok</span>()) {</span><br><span class="line"> io_status_ = io_s;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (status.<span class="built_in">ok</span>()) {</span><br><span class="line"> <span class="keyword">constexpr</span> IODebugContext* dbg = <span class="literal">nullptr</span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (output_directory_) {</span><br><span class="line"> io_s = output_directory_-><span class="built_in">Fsync</span>(<span class="built_in">IOOptions</span>(), dbg);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (io_s.<span class="built_in">ok</span>() && wrote_new_blob_files && blob_output_directory_ &&</span><br><span class="line"> blob_output_directory_ != output_directory_) {</span><br><span class="line"> io_s = blob_output_directory_-><span class="built_in">Fsync</span>(<span class="built_in">IOOptions</span>(), dbg);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (io_status_.<span class="built_in">ok</span>()) {</span><br><span class="line"> io_status_ = io_s;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (status.<span class="built_in">ok</span>()) {</span><br><span class="line"> status = io_s;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (status.<span class="built_in">ok</span>()) {</span><br><span class="line"> thread_pool.<span class="built_in">clear</span>();</span><br><span class="line"> std::vector<<span class="type">const</span> CompactionJob::SubcompactionState::Output*> files_output;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">const</span> <span class="keyword">auto</span>& state : compact_->sub_compact_states) {</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">const</span> <span class="keyword">auto</span>& output : state.outputs) {</span><br><span class="line"> files_output.<span class="built_in">emplace_back</span>(&output);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> ColumnFamilyData* cfd = compact_->compaction-><span class="built_in">column_family_data</span>();</span><br><span class="line"> <span class="keyword">auto</span> prefix_extractor =</span><br><span class="line"> compact_->compaction-><span class="built_in">mutable_cf_options</span>()->prefix_extractor.<span class="built_in">get</span>();</span><br><span class="line"> <span class="function">std::atomic<<span class="type">size_t</span>> <span class="title">next_file_idx</span><span class="params">(<span class="number">0</span>)</span></span>;</span><br><span class="line"> <span class="keyword">auto</span> verify_table = [&](Status& output_status) {</span><br><span class="line"> <span class="keyword">while</span> (<span class="literal">true</span>) {</span><br><span class="line"> <span class="type">size_t</span> file_idx = next_file_idx.<span class="built_in">fetch_add</span>(<span class="number">1</span>);</span><br><span class="line"> <span class="keyword">if</span> (file_idx >= files_output.<span class="built_in">size</span>()) {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="comment">// Verify that the table is usable</span></span><br><span class="line"> <span class="comment">// We set for_compaction to false and don't OptimizeForCompactionTableRead</span></span><br><span class="line"> <span class="comment">// here because this is a special case after we finish the table building</span></span><br><span class="line"> <span class="comment">// No matter whether use_direct_io_for_flush_and_compaction is true,</span></span><br><span class="line"> <span class="comment">// we will regard this verification as user reads since the goal is</span></span><br><span class="line"> <span class="comment">// to cache it here for further user reads</span></span><br><span class="line"> ReadOptions read_options;</span><br><span class="line"> InternalIterator* iter = cfd-><span class="built_in">table_cache</span>()-><span class="built_in">NewIterator</span>(</span><br><span class="line"> read_options, file_options_, cfd-><span class="built_in">internal_comparator</span>(),</span><br><span class="line"> files_output[file_idx]->meta, <span class="comment">/*range_del_agg=*/</span><span class="literal">nullptr</span>,</span><br><span class="line"> prefix_extractor,</span><br><span class="line"> <span class="comment">/*table_reader_ptr=*/</span><span class="literal">nullptr</span>,</span><br><span class="line"> cfd-><span class="built_in">internal_stats</span>()-><span class="built_in">GetFileReadHist</span>(</span><br><span class="line"> compact_->compaction-><span class="built_in">output_level</span>()),</span><br><span class="line"> TableReaderCaller::kCompactionRefill, <span class="comment">/*arena=*/</span><span class="literal">nullptr</span>,</span><br><span class="line"> <span class="comment">/*skip_filters=*/</span><span class="literal">false</span>, compact_->compaction-><span class="built_in">output_level</span>(),</span><br><span class="line"> <span class="built_in">MaxFileSizeForL0MetaPin</span>(</span><br><span class="line"> *compact_->compaction-><span class="built_in">mutable_cf_options</span>()),</span><br><span class="line"> <span class="comment">/*smallest_compaction_key=*/</span><span class="literal">nullptr</span>,</span><br><span class="line"> <span class="comment">/*largest_compaction_key=*/</span><span class="literal">nullptr</span>,</span><br><span class="line"> <span class="comment">/*allow_unprepared_value=*/</span><span class="literal">false</span>);</span><br><span class="line"> <span class="keyword">auto</span> s = iter-><span class="built_in">status</span>();</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (s.<span class="built_in">ok</span>() && paranoid_file_checks_) {</span><br><span class="line"> <span class="function">OutputValidator <span class="title">validator</span><span class="params">(cfd->internal_comparator(),</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="comment">/*_enable_order_check=*/</span><span class="literal">true</span>,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="comment">/*_enable_hash=*/</span><span class="literal">true</span>)</span></span>;</span><br><span class="line"> <span class="keyword">for</span> (iter-><span class="built_in">SeekToFirst</span>(); iter-><span class="built_in">Valid</span>(); iter-><span class="built_in">Next</span>()) {</span><br><span class="line"> s = validator.<span class="built_in">Add</span>(iter-><span class="built_in">key</span>(), iter-><span class="built_in">value</span>());</span><br><span class="line"> <span class="keyword">if</span> (!s.<span class="built_in">ok</span>()) {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (s.<span class="built_in">ok</span>()) {</span><br><span class="line"> s = iter-><span class="built_in">status</span>();</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (s.<span class="built_in">ok</span>() &&</span><br><span class="line"> !validator.<span class="built_in">CompareValidator</span>(files_output[file_idx]->validator)) {</span><br><span class="line"> s = Status::<span class="built_in">Corruption</span>(<span class="string">"Paranoid checksums do not match"</span>);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">delete</span> iter;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (!s.<span class="built_in">ok</span>()) {</span><br><span class="line"> output_status = s;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> };</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">size_t</span> i = <span class="number">1</span>; i < compact_->sub_compact_states.<span class="built_in">size</span>(); i++) {</span><br><span class="line"> thread_pool.<span class="built_in">emplace_back</span>(verify_table,</span><br><span class="line"> std::<span class="built_in">ref</span>(compact_->sub_compact_states[i].status));</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">verify_table</span>(compact_->sub_compact_states[<span class="number">0</span>].status);</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span>& thread : thread_pool) {</span><br><span class="line"> thread.<span class="built_in">join</span>();</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">const</span> <span class="keyword">auto</span>& state : compact_->sub_compact_states) {</span><br><span class="line"> <span class="keyword">if</span> (!state.status.<span class="built_in">ok</span>()) {</span><br><span class="line"> status = state.status;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> TablePropertiesCollection tp;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">const</span> <span class="keyword">auto</span>& state : compact_->sub_compact_states) {</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">const</span> <span class="keyword">auto</span>& output : state.outputs) {</span><br><span class="line"> <span class="keyword">auto</span> fn =</span><br><span class="line"> <span class="built_in">TableFileName</span>(state.compaction-><span class="built_in">immutable_options</span>()->cf_paths,</span><br><span class="line"> output.meta.fd.<span class="built_in">GetNumber</span>(), output.meta.fd.<span class="built_in">GetPathId</span>());</span><br><span class="line"> tp[fn] = output.table_properties;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> compact_->compaction-><span class="built_in">SetOutputTableProperties</span>(std::<span class="built_in">move</span>(tp));</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Finish up all book-keeping to unify the subcompaction results</span></span><br><span class="line"> <span class="built_in">AggregateStatistics</span>();</span><br><span class="line"> <span class="built_in">UpdateCompactionStats</span>();</span><br><span class="line"></span><br><span class="line"> <span class="built_in">RecordCompactionIOStats</span>();</span><br><span class="line"> <span class="built_in">LogFlush</span>(db_options_.info_log);</span><br><span class="line"> <span class="built_in">TEST_SYNC_POINT</span>(<span class="string">"CompactionJob::Run():End"</span>);</span><br><span class="line"></span><br><span class="line"> compact_->status = status;</span><br><span class="line"> <span class="keyword">return</span> status;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li>遍历所有的sub_compact,然后启动线程来进行对应的compact工作,最后等到所有的线程完成,然后退出。</li><li>通过<code>ProcessKeyValueCompaction</code>拿到的sub_compact_states进行真正的compaction处理实际的key-value数据。</li></ul></li></ul><h2 id="process-keys"><a class="markdownIt-Anchor" href="#process-keys"></a> Process keys</h2><h3 id="构造能够访问所有key的迭代器"><a class="markdownIt-Anchor" href="#构造能够访问所有key的迭代器"></a> 构造能够访问所有key的迭代器</h3><p>首先进入到<code>ProcessKeyValueCompaction</code>函数中,通过之前步骤中填充的sub_compact数据取出对应的key-value数据,构造一个InternalIterator。这一部分主要做key之间的排序以及inernal key的merge操作。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">std::unique_ptr<InternalIterator> <span class="title">raw_input</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> versions_->MakeInputIterator(read_options, sub_compact->compaction,</span></span></span><br><span class="line"><span class="params"><span class="function"> &range_del_agg, file_options_for_read_))</span></span>;</span><br><span class="line">InternalIterator* input = raw_input.<span class="built_in">get</span>();</span><br></pre></td></tr></table></figure><ul><li><p>构造的过程是通过函数<code>MakeInputIterator</code>进行的,我们进入到该函数,这个函数构造迭代器的逻辑同样区分level-0和level-其他。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">InternalIterator* <span class="title">VersionSet::MakeInputIterator</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">const</span> ReadOptions& read_options, <span class="type">const</span> Compaction* c,</span></span></span><br><span class="line"><span class="params"><span class="function"> RangeDelAggregator* range_del_agg,</span></span></span><br><span class="line"><span class="params"><span class="function"> <span class="type">const</span> FileOptions& file_options_compactions)</span> </span>{</span><br><span class="line"> <span class="keyword">auto</span> cfd = c-><span class="built_in">column_family_data</span>();</span><br><span class="line"> <span class="comment">// Level-0 files have to be merged together. For other levels,</span></span><br><span class="line"> <span class="comment">// we will make a concatenating iterator per level.</span></span><br><span class="line"> <span class="comment">// TODO(opt): use concatenating iterator for level-0 if there is no overlap</span></span><br><span class="line"> <span class="type">const</span> <span class="type">size_t</span> space = (c-><span class="built_in">level</span>() == <span class="number">0</span> ? c-><span class="built_in">input_levels</span>(<span class="number">0</span>)->num_files +</span><br><span class="line"> c-><span class="built_in">num_input_levels</span>() - <span class="number">1</span></span><br><span class="line"> : c-><span class="built_in">num_input_levels</span>());</span><br><span class="line"> InternalIterator** list = <span class="keyword">new</span> InternalIterator* [space];</span><br><span class="line"> <span class="type">size_t</span> num = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">size_t</span> which = <span class="number">0</span>; which < c-><span class="built_in">num_input_levels</span>(); which++) {</span><br><span class="line"> <span class="keyword">if</span> (c-><span class="built_in">input_levels</span>(which)->num_files != <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">if</span> (c-><span class="built_in">level</span>(which) == <span class="number">0</span>) {</span><br><span class="line"> <span class="type">const</span> LevelFilesBrief* flevel = c-><span class="built_in">input_levels</span>(which);</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">size_t</span> i = <span class="number">0</span>; i < flevel->num_files; i++) {</span><br><span class="line"> list[num++] = cfd-><span class="built_in">table_cache</span>()-><span class="built_in">NewIterator</span>(</span><br><span class="line"> read_options, file_options_compactions,</span><br><span class="line"> cfd-><span class="built_in">internal_comparator</span>(), *flevel->files[i].file_metadata,</span><br><span class="line"> range_del_agg, c-><span class="built_in">mutable_cf_options</span>()->prefix_extractor.<span class="built_in">get</span>(),</span><br><span class="line"> <span class="comment">/*table_reader_ptr=*/</span><span class="literal">nullptr</span>,</span><br><span class="line"> <span class="comment">/*file_read_hist=*/</span><span class="literal">nullptr</span>, TableReaderCaller::kCompaction,</span><br><span class="line"> <span class="comment">/*arena=*/</span><span class="literal">nullptr</span>,</span><br><span class="line"> <span class="comment">/*skip_filters=*/</span><span class="literal">false</span>,</span><br><span class="line"> <span class="comment">/*level=*/</span><span class="built_in">static_cast</span><<span class="type">int</span>>(c-><span class="built_in">level</span>(which)),</span><br><span class="line"> <span class="built_in">MaxFileSizeForL0MetaPin</span>(*c-><span class="built_in">mutable_cf_options</span>()),</span><br><span class="line"> <span class="comment">/*smallest_compaction_key=*/</span><span class="literal">nullptr</span>,</span><br><span class="line"> <span class="comment">/*largest_compaction_key=*/</span><span class="literal">nullptr</span>,</span><br><span class="line"> <span class="comment">/*allow_unprepared_value=*/</span><span class="literal">false</span>);</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">// Create concatenating iterator for the files from this level</span></span><br><span class="line"> list[num++] = <span class="keyword">new</span> <span class="built_in">LevelIterator</span>(</span><br><span class="line"> cfd-><span class="built_in">table_cache</span>(), read_options, file_options_compactions,</span><br><span class="line"> cfd-><span class="built_in">internal_comparator</span>(), c-><span class="built_in">input_levels</span>(which),</span><br><span class="line"> c-><span class="built_in">mutable_cf_options</span>()->prefix_extractor.<span class="built_in">get</span>(),</span><br><span class="line"> <span class="comment">/*should_sample=*/</span><span class="literal">false</span>,</span><br><span class="line"> <span class="comment">/*no per level latency histogram=*/</span><span class="literal">nullptr</span>,</span><br><span class="line"> TableReaderCaller::kCompaction, <span class="comment">/*skip_filters=*/</span><span class="literal">false</span>,</span><br><span class="line"> <span class="comment">/*level=*/</span><span class="built_in">static_cast</span><<span class="type">int</span>>(c-><span class="built_in">level</span>(which)), range_del_agg,</span><br><span class="line"> c-><span class="built_in">boundaries</span>(which));</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">assert</span>(num <= space);</span><br><span class="line"> InternalIterator* result =</span><br><span class="line"> <span class="built_in">NewMergingIterator</span>(&c-><span class="built_in">column_family_data</span>()-><span class="built_in">internal_comparator</span>(), list,</span><br><span class="line"> <span class="built_in">static_cast</span><<span class="type">int</span>>(num));</span><br><span class="line"> <span class="keyword">delete</span>[] list;</span><br><span class="line"> <span class="keyword">return</span> result;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li><p>首先获取当前sub_compact所属的cfd。</p></li><li><p>针对level-0,为其中的每一个sst文件构建一个table_cache迭代器,放入list中。</p></li><li><p>针对其他非level-0的层,每一层直接创建一个级联的迭代器并放入list中。也就是这个迭代器从它的start就能够顺序访问到该层最后一个sst文件的最后一个key。</p></li><li><p>将所有层的迭代器添加到一个迭代器数组list中,通过<code>NewMergingIterator</code>迭代器维护一个底层的排序堆结构,完成所有层之间的key-value的排序。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">InternalIterator* <span class="title">NewMergingIterator</span><span class="params">(<span class="type">const</span> InternalKeyComparator* cmp,</span></span></span><br><span class="line"><span class="params"><span class="function"> InternalIterator** list, <span class="type">int</span> n,</span></span></span><br><span class="line"><span class="params"><span class="function"> Arena* arena, <span class="type">bool</span> prefix_seek_mode)</span> </span>{</span><br><span class="line"> <span class="built_in">assert</span>(n >= <span class="number">0</span>);</span><br><span class="line"> <span class="keyword">if</span> (n == <span class="number">0</span>) {</span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">NewEmptyInternalIterator</span><Slice>(arena);</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (n == <span class="number">1</span>) {</span><br><span class="line"> <span class="keyword">return</span> list[<span class="number">0</span>];</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">if</span> (arena == <span class="literal">nullptr</span>) {</span><br><span class="line"> <span class="keyword">return</span> <span class="keyword">new</span> <span class="built_in">MergingIterator</span>(cmp, list, n, <span class="literal">false</span>, prefix_seek_mode);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">auto</span> mem = arena-><span class="built_in">AllocateAligned</span>(<span class="built_in">sizeof</span>(MergingIterator));</span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">new</span> (mem) <span class="built_in">MergingIterator</span>(cmp, list, n, <span class="literal">true</span>, prefix_seek_mode);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li><p>如果list是空的,则直接返回空。</p></li><li><p>如果只有一个,那么认为这个迭代器本身就是有序的,不需要构建一个堆排序的迭代器(level-0的sst内部是有序的,之前创建的时候是为level-0每一个sst创建一个list元素;非level-0的整层都是有序的)。</p></li><li><p>如果有多个,那么直接通过<code>MergingIterator</code>来创建堆排序的迭代器。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">MergingIterator</span>(<span class="type">const</span> InternalKeyComparator* comparator,</span><br><span class="line"> InternalIterator** children, <span class="type">int</span> n, <span class="type">bool</span> is_arena_mode,</span><br><span class="line"> <span class="type">bool</span> prefix_seek_mode)</span><br><span class="line"> : <span class="built_in">is_arena_mode_</span>(is_arena_mode),</span><br><span class="line"> <span class="built_in">comparator_</span>(comparator),</span><br><span class="line"> <span class="built_in">current_</span>(<span class="literal">nullptr</span>),</span><br><span class="line"> <span class="built_in">direction_</span>(kForward),</span><br><span class="line"> <span class="built_in">minHeap_</span>(comparator_),</span><br><span class="line"> <span class="built_in">prefix_seek_mode_</span>(prefix_seek_mode),</span><br><span class="line"> <span class="built_in">pinned_iters_mgr_</span>(<span class="literal">nullptr</span>) {</span><br><span class="line"> children_.<span class="built_in">resize</span>(n);</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i < n; i++) {</span><br><span class="line"> children_[i].<span class="built_in">Set</span>(children[i]);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">auto</span>& child : children_) {</span><br><span class="line"> <span class="built_in">AddToMinHeapOrCheckStatus</span>(&child);</span><br><span class="line"> }</span><br><span class="line"> current_ = <span class="built_in">CurrentForward</span>();</span><br><span class="line"> }</span><br></pre></td></tr></table></figure><p>通过将传入的list也就是函数中的children中的所有元素添加到一个vector中,再遍历其中的每一个key-value,通过函数 <code>AddToMinHeapOrCheckStatus</code>构造底层结构堆,堆中的元素顺序是由用户参数<code>option.comparator</code>指定,默认是BytewiseComparator支持的lexicographical order,也就是字典顺序。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">MergingIterator::AddToMinHeapOrCheckStatus</span><span class="params">(IteratorWrapper* child)</span> </span>{</span><br><span class="line"> <span class="keyword">if</span> (child-><span class="built_in">Valid</span>()) {</span><br><span class="line"> <span class="built_in">assert</span>(child-><span class="built_in">status</span>().<span class="built_in">ok</span>());</span><br><span class="line"> minHeap_.<span class="built_in">push</span>(child);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">considerStatus</span>(child-><span class="built_in">status</span>());</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure></li></ul></li></ul></li></ul><h3 id="通过seektofirst和next指针处理元素"><a class="markdownIt-Anchor" href="#通过seektofirst和next指针处理元素"></a> 通过SeekToFirst和Next指针处理元素</h3><p>回到<code>ProcessKeyValueCompaction</code>函数,使用构造好的internalIterator再构造一个包含所有状态的CompactionIterator,直接初始化就可以,构造完成需要将CompactionIterator的内部指针放在整个迭代器最开始的部位,通过Next指针来获取下一个key-value,同时还需要需要在每次迭代器元素内部移动的时候除了调整底层堆中的字典序结构之外,还兼顾处理各个不同type的key数据,将kValueType,kTypeDeletion,kTypeSingleDeletion,kValueDeleteRange,kTypeMerge 等不同的key type处理完成。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">c_iter-><span class="built_in">SeekToFirst</span>();</span><br><span class="line">......</span><br><span class="line"><span class="keyword">while</span> (status.<span class="built_in">ok</span>() && !cfd-><span class="built_in">IsDropped</span>() && c_iter-><span class="built_in">Valid</span>()) {</span><br><span class="line"> <span class="comment">// Invariant: c_iter.status() is guaranteed to be OK if c_iter->Valid()</span></span><br><span class="line"> <span class="comment">// returns true.</span></span><br><span class="line"> <span class="type">const</span> Slice& key = c_iter-><span class="built_in">key</span>();</span><br><span class="line"> <span class="type">const</span> Slice& value = c_iter-><span class="built_in">value</span>();</span><br><span class="line"> ......</span><br><span class="line"> c_iter-><span class="built_in">Next</span>();</span><br><span class="line"> ...</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h2 id="write-keys"><a class="markdownIt-Anchor" href="#write-keys"></a> Write keys</h2><p>这一步也在<code>ProcessKeyValueCompaction</code>函数中,将key-value写入SST文件中。</p><ul><li><p>确认key 的valueType类型,如果是data_block或者index_block类型,则放入builder状态机中</p></li><li><p>优先创建filter_buiilder和index_builder,index builer创建成 分层格式(两层index leve, 第一层多个restart点,用来索引具体的datablock;第二层索引第一层的index block),方便加载到内存进行二分查找,节约内存消耗,加速查找;其次再写data_block_builder</p></li><li><p>如果key的 valueType类型是 range_deletion,则加入到range_delete_block_builder之中</p></li><li><p>先将data_block builder 利用绑定的输出的文件的writer写入底层文件</p></li><li><p>将filter_block / index_builder / compress_builder/range_del_builder/properties_builder 按照对应的格式加入到 meta_data_builder之中,利用绑定ouput 文件的 writer写入底层存储</p></li><li><p>利用meta_data_handle 和 index_handle 封装footer,写入底层存储</p></li></ul><h3 id="将builder与输出文件的writer绑定"><a class="markdownIt-Anchor" href="#将builder与输出文件的writer绑定"></a> 将builder与输出文件的writer绑定</h3><p>默认的blockbase table SST文件有很多不同的block,除了data block之外,其他的block都是需要先写入到一个临时的数据结构builder,然后由builder通过其绑定的output文件的writer写入到底层磁盘形成磁盘的sst文件结构。</p><p>这里的逻辑就是将builder与output文件的writer进行绑定,创建好table builder。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Open output file if necessary</span></span><br><span class="line"><span class="keyword">if</span> (sub_compact->builder == <span class="literal">nullptr</span>) {</span><br><span class="line"> status = <span class="built_in">OpenCompactionOutputFile</span>(sub_compact);</span><br><span class="line"> <span class="keyword">if</span> (!status.<span class="built_in">ok</span>()) {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h3 id="通过table_builder的状态机添加block数据"><a class="markdownIt-Anchor" href="#通过table_builder的状态机添加block数据"></a> 通过table_builder的状态机添加block数据</h3><p>然后调用<code>builder->Add</code>函数构造对应的builder结构,添加的过程主要是通过拥有三个状态的状态机完成不同block的builder创建,状态机是由构造tablebuilder的时候创建的。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">status = sub_compact-><span class="built_in">AddToBuilder</span>(key, value);</span><br><span class="line"><span class="keyword">if</span> (!status.<span class="built_in">ok</span>()) {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">Status <span class="title">AddToBuilder</span><span class="params">(<span class="type">const</span> Slice& key, <span class="type">const</span> Slice& value)</span> </span>{</span><br><span class="line"> <span class="keyword">auto</span> curr = <span class="built_in">current_output</span>();</span><br><span class="line"> <span class="built_in">assert</span>(builder != <span class="literal">nullptr</span>);</span><br><span class="line"> <span class="built_in">assert</span>(curr != <span class="literal">nullptr</span>);</span><br><span class="line"> Status s = curr->validator.<span class="built_in">Add</span>(key, value);</span><br><span class="line"> <span class="keyword">if</span> (!s.<span class="built_in">ok</span>()) {</span><br><span class="line"> <span class="keyword">return</span> s;</span><br><span class="line"> }</span><br><span class="line"> builder-><span class="built_in">Add</span>(key, value);</span><br><span class="line"> <span class="keyword">return</span> Status::<span class="built_in">OK</span>();</span><br><span class="line"> }</span><br></pre></td></tr></table></figure><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">BlockBasedTableBuilder::Add</span><span class="params">(<span class="type">const</span> Slice& key, <span class="type">const</span> Slice& value)</span> </span>{</span><br><span class="line"> Rep* r = rep_;</span><br><span class="line"> <span class="built_in">assert</span>(rep_->state != Rep::State::kClosed);</span><br><span class="line"> <span class="keyword">if</span> (!<span class="built_in">ok</span>()) <span class="keyword">return</span>;</span><br><span class="line"> ValueType value_type = <span class="built_in">ExtractValueType</span>(key);</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">IsValueType</span>(value_type)) {</span><br><span class="line"><span class="meta">#<span class="keyword">ifndef</span> NDEBUG</span></span><br><span class="line"> <span class="keyword">if</span> (r->props.num_entries > r->props.num_range_deletions) {</span><br><span class="line"> <span class="built_in">assert</span>(r->internal_comparator.<span class="built_in">Compare</span>(key, <span class="built_in">Slice</span>(r->last_key)) > <span class="number">0</span>);</span><br><span class="line"> }</span><br><span class="line"><span class="meta">#<span class="keyword">endif</span> <span class="comment">// !NDEBUG</span></span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">auto</span> should_flush = r->flush_block_policy-><span class="built_in">Update</span>(key, value);</span><br><span class="line"> <span class="keyword">if</span> (should_flush) {</span><br><span class="line"> <span class="built_in">assert</span>(!r->data_block.<span class="built_in">empty</span>());</span><br><span class="line"> r->first_key_in_next_block = &key;</span><br><span class="line"> <span class="built_in">Flush</span>();</span><br><span class="line"> <span class="keyword">if</span> (r->state == Rep::State::kBuffered) {</span><br><span class="line"> <span class="type">bool</span> exceeds_buffer_limit =</span><br><span class="line"> (r->buffer_limit != <span class="number">0</span> && r->data_begin_offset > r->buffer_limit);</span><br><span class="line"> <span class="type">bool</span> is_cache_full = <span class="literal">false</span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Increase cache reservation for the last buffered data block</span></span><br><span class="line"> <span class="comment">// only if the block is not going to be unbuffered immediately</span></span><br><span class="line"> <span class="comment">// and there exists a cache reservation manager</span></span><br><span class="line"> <span class="keyword">if</span> (!exceeds_buffer_limit && r->cache_rev_mng != <span class="literal">nullptr</span>) {</span><br><span class="line"> Status s = r->cache_rev_mng-><span class="built_in">UpdateCacheReservation</span><</span><br><span class="line"> CacheEntryRole::kCompressionDictionaryBuildingBuffer>(</span><br><span class="line"> r->data_begin_offset);</span><br><span class="line"> is_cache_full = s.<span class="built_in">IsIncomplete</span>();</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (exceeds_buffer_limit || is_cache_full) {</span><br><span class="line"> <span class="built_in">EnterUnbuffered</span>();</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Add item to index block.</span></span><br><span class="line"> <span class="comment">// We do not emit the index entry for a block until we have seen the</span></span><br><span class="line"> <span class="comment">// first key for the next data block. This allows us to use shorter</span></span><br><span class="line"> <span class="comment">// keys in the index block. For example, consider a block boundary</span></span><br><span class="line"> <span class="comment">// between the keys "the quick brown fox" and "the who". We can use</span></span><br><span class="line"> <span class="comment">// "the r" as the key for the index block entry since it is >= all</span></span><br><span class="line"> <span class="comment">// entries in the first block and < all entries in subsequent</span></span><br><span class="line"> <span class="comment">// blocks.</span></span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">ok</span>() && r->state == Rep::State::kUnbuffered) {</span><br><span class="line"> <span class="keyword">if</span> (r-><span class="built_in">IsParallelCompressionEnabled</span>()) {</span><br><span class="line"> r->pc_rep->curr_block_keys-><span class="built_in">Clear</span>();</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> r->index_builder-><span class="built_in">AddIndexEntry</span>(&r->last_key, &key,</span><br><span class="line"> r->pending_handle);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Note: PartitionedFilterBlockBuilder requires key being added to filter</span></span><br><span class="line"> <span class="comment">// builder after being added to index builder.</span></span><br><span class="line"> <span class="keyword">if</span> (r->state == Rep::State::kUnbuffered) {</span><br><span class="line"> <span class="keyword">if</span> (r-><span class="built_in">IsParallelCompressionEnabled</span>()) {</span><br><span class="line"> r->pc_rep->curr_block_keys-><span class="built_in">PushBack</span>(key);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">if</span> (r->filter_builder != <span class="literal">nullptr</span>) {</span><br><span class="line"> <span class="type">size_t</span> ts_sz =</span><br><span class="line"> r->internal_comparator.<span class="built_in">user_comparator</span>()-><span class="built_in">timestamp_size</span>();</span><br><span class="line"> r->filter_builder-><span class="built_in">Add</span>(<span class="built_in">ExtractUserKeyAndStripTimestamp</span>(key, ts_sz));</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> r->last_key.<span class="built_in">assign</span>(key.<span class="built_in">data</span>(), key.<span class="built_in">size</span>());</span><br><span class="line"> r->data_block.<span class="built_in">Add</span>(key, value);</span><br><span class="line"> <span class="keyword">if</span> (r->state == Rep::State::kBuffered) {</span><br><span class="line"> <span class="comment">// Buffered keys will be replayed from data_block_buffers during</span></span><br><span class="line"> <span class="comment">// `Finish()` once compression dictionary has been finalized.</span></span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">if</span> (!r-><span class="built_in">IsParallelCompressionEnabled</span>()) {</span><br><span class="line"> r->index_builder-><span class="built_in">OnKeyAdded</span>(key);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="comment">// TODO offset passed in is not accurate for parallel compression case</span></span><br><span class="line"> <span class="built_in">NotifyCollectTableCollectorsOnAdd</span>(key, value, r-><span class="built_in">get_offset</span>(),</span><br><span class="line"> r->table_properties_collectors,</span><br><span class="line"> r->ioptions.logger);</span><br><span class="line"></span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (value_type == kTypeRangeDeletion) {</span><br><span class="line"> r->range_del_block.<span class="built_in">Add</span>(key, value);</span><br><span class="line"> <span class="comment">// TODO offset passed in is not accurate for parallel compression case</span></span><br><span class="line"> <span class="built_in">NotifyCollectTableCollectorsOnAdd</span>(key, value, r-><span class="built_in">get_offset</span>(),</span><br><span class="line"> r->table_properties_collectors,</span><br><span class="line"> r->ioptions.logger);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="built_in">assert</span>(<span class="literal">false</span>);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> r->props.num_entries++;</span><br><span class="line"> r->props.raw_key_size += key.<span class="built_in">size</span>();</span><br><span class="line"> r->props.raw_value_size += value.<span class="built_in">size</span>();</span><br><span class="line"> <span class="keyword">if</span> (value_type == kTypeDeletion || value_type == kTypeSingleDeletion) {</span><br><span class="line"> r->props.num_deletions++;</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (value_type == kTypeRangeDeletion) {</span><br><span class="line"> r->props.num_deletions++;</span><br><span class="line"> r->props.num_range_deletions++;</span><br><span class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (value_type == kTypeMerge) {</span><br><span class="line"> r->props.num_merge_operands++;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ul><li><strong>kBuffered</strong>为状态机的初始状态。处于这个状态的时候,内存有较多缓存的未压缩的datablock。在该状态的过程中,通过 EnterUnbuffered 函数构造compression block,依此构建对应的index block和filterblock。最终将状态置为下一个状态的:kUnbuffered。</li><li><strong>kUnbuffered</strong>这个状态时,compressing block已经通过之前的buffer中的data初步构造完成,且接下来将在这个状态通过 Finish 完成各个block的写入 或者通过 Abandon 丢弃当前的写入。</li><li><strong>kClosed</strong>这个状态之前已经完成了table builder的finish或者abandon,那么接下来将析构当前的table builder。</li></ul><p>对于第一个状态,进入下面的逻辑。如果data block能够满足flush的条件,则直接flush datablock的数据到当前bulider对应的datablock存储结构中。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">auto</span> should_flush = r->flush_block_policy-><span class="built_in">Update</span>(key, value);</span><br><span class="line"> <span class="keyword">if</span> (should_flush) {</span><br><span class="line"> <span class="built_in">assert</span>(!r->data_block.<span class="built_in">empty</span>());</span><br><span class="line"> r->first_key_in_next_block = &key;</span><br><span class="line"> <span class="built_in">Flush</span>();</span><br><span class="line"> <span class="keyword">if</span> (r->state == Rep::State::kBuffered) {</span><br><span class="line"> <span class="type">bool</span> exceeds_buffer_limit =</span><br><span class="line"> (r->buffer_limit != <span class="number">0</span> && r->data_begin_offset > r->buffer_limit);</span><br><span class="line"> <span class="type">bool</span> is_cache_full = <span class="literal">false</span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Increase cache reservation for the last buffered data block</span></span><br><span class="line"> <span class="comment">// only if the block is not going to be unbuffered immediately</span></span><br><span class="line"> <span class="comment">// and there exists a cache reservation manager</span></span><br><span class="line"> <span class="keyword">if</span> (!exceeds_buffer_limit && r->cache_rev_mng != <span class="literal">nullptr</span>) {</span><br><span class="line"> Status s = r->cache_rev_mng-><span class="built_in">UpdateCacheReservation</span><</span><br><span class="line"> CacheEntryRole::kCompressionDictionaryBuildingBuffer>(</span><br><span class="line"> r->data_begin_offset);</span><br><span class="line"> is_cache_full = s.<span class="built_in">IsIncomplete</span>();</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (exceeds_buffer_limit || is_cache_full) {</span><br><span class="line"> <span class="built_in">EnterUnbuffered</span>();</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Add item to index block.</span></span><br><span class="line"> <span class="comment">// We do not emit the index entry for a block until we have seen the</span></span><br><span class="line"> <span class="comment">// first key for the next data block. This allows us to use shorter</span></span><br><span class="line"> <span class="comment">// keys in the index block. For example, consider a block boundary</span></span><br><span class="line"> <span class="comment">// between the keys "the quick brown fox" and "the who". We can use</span></span><br><span class="line"> <span class="comment">// "the r" as the key for the index block entry since it is >= all</span></span><br><span class="line"> <span class="comment">// entries in the first block and < all entries in subsequent</span></span><br><span class="line"> <span class="comment">// blocks.</span></span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">ok</span>() && r->state == Rep::State::kUnbuffered) {</span><br><span class="line"> <span class="keyword">if</span> (r-><span class="built_in">IsParallelCompressionEnabled</span>()) {</span><br><span class="line"> r->pc_rep->curr_block_keys-><span class="built_in">Clear</span>();</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> r->index_builder-><span class="built_in">AddIndexEntry</span>(&r->last_key, &key,</span><br><span class="line"> r->pending_handle);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br></pre></td></tr></table></figure><p><code>EnterUnbuffered</code>函数主要逻辑是构造compression block,如果我们开启了compression的选项则会构造。</p><p>同时依据之前flush添加到datablock中的数据来构造index block和filter block,用来索引datablock的数据。选择在这里构造的话主要还是因为flush的时候表示一个完整的datablock已经写入完成,这里需要通过一个完整的datablock数据才有必要构造一条indexblock的数据。</p><p>其中data_block_and_keys_buffers数组存放的是未经过压缩的datablock数据。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> (<span class="type">size_t</span> i = <span class="number">0</span>; <span class="built_in">ok</span>() && i < r->data_block_buffers.<span class="built_in">size</span>(); ++i) {</span><br><span class="line"> <span class="keyword">if</span> (iter == <span class="literal">nullptr</span>) {</span><br><span class="line"> iter = <span class="built_in">get_iterator_for_block</span>(i);</span><br><span class="line"> <span class="built_in">assert</span>(iter != <span class="literal">nullptr</span>);</span><br><span class="line"> };</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (i + <span class="number">1</span> < r->data_block_buffers.<span class="built_in">size</span>()) {</span><br><span class="line"> next_block_iter = <span class="built_in">get_iterator_for_block</span>(i + <span class="number">1</span>);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">auto</span>& data_block = r->data_block_buffers[i];</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (r-><span class="built_in">IsParallelCompressionEnabled</span>()) {</span><br><span class="line"> Slice first_key_in_next_block;</span><br><span class="line"> <span class="type">const</span> Slice* first_key_in_next_block_ptr = &first_key_in_next_block;</span><br><span class="line"> <span class="keyword">if</span> (i + <span class="number">1</span> < r->data_block_buffers.<span class="built_in">size</span>()) {</span><br><span class="line"> <span class="built_in">assert</span>(next_block_iter != <span class="literal">nullptr</span>);</span><br><span class="line"> first_key_in_next_block = next_block_iter-><span class="built_in">key</span>();</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> first_key_in_next_block_ptr = r->first_key_in_next_block;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> std::vector<std::string> keys;</span><br><span class="line"> <span class="keyword">for</span> (; iter-><span class="built_in">Valid</span>(); iter-><span class="built_in">Next</span>()) {</span><br><span class="line"> keys.<span class="built_in">emplace_back</span>(iter-><span class="built_in">key</span>().<span class="built_in">ToString</span>());</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> ParallelCompressionRep::BlockRep* block_rep = r->pc_rep-><span class="built_in">PrepareBlock</span>(</span><br><span class="line"> r->compression_type, first_key_in_next_block_ptr, &data_block, &keys);</span><br><span class="line"></span><br><span class="line"> <span class="built_in">assert</span>(block_rep != <span class="literal">nullptr</span>);</span><br><span class="line"> r->pc_rep->file_size_estimator.<span class="built_in">EmitBlock</span>(block_rep->data-><span class="built_in">size</span>(),</span><br><span class="line"> r-><span class="built_in">get_offset</span>());</span><br><span class="line"> r->pc_rep-><span class="built_in">EmitBlock</span>(block_rep);</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="keyword">for</span> (; iter-><span class="built_in">Valid</span>(); iter-><span class="built_in">Next</span>()) {</span><br><span class="line"> Slice key = iter-><span class="built_in">key</span>();</span><br><span class="line"> <span class="keyword">if</span> (r->filter_builder != <span class="literal">nullptr</span>) {</span><br><span class="line"> <span class="type">size_t</span> ts_sz =</span><br><span class="line"> r->internal_comparator.<span class="built_in">user_comparator</span>()-><span class="built_in">timestamp_size</span>();</span><br><span class="line"> r->filter_builder-><span class="built_in">Add</span>(<span class="built_in">ExtractUserKeyAndStripTimestamp</span>(key, ts_sz));</span><br><span class="line"> }</span><br><span class="line"> r->index_builder-><span class="built_in">OnKeyAdded</span>(key);</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">WriteBlock</span>(<span class="built_in">Slice</span>(data_block), &r->pending_handle, BlockType::kData);</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">ok</span>() && i + <span class="number">1</span> < r->data_block_buffers.<span class="built_in">size</span>()) {</span><br><span class="line"> <span class="built_in">assert</span>(next_block_iter != <span class="literal">nullptr</span>);</span><br><span class="line"> Slice first_key_in_next_block = next_block_iter-><span class="built_in">key</span>();</span><br><span class="line"></span><br><span class="line"> Slice* first_key_in_next_block_ptr = &first_key_in_next_block;</span><br><span class="line"></span><br><span class="line"> iter-><span class="built_in">SeekToLast</span>();</span><br><span class="line"> std::string last_key = iter-><span class="built_in">key</span>().<span class="built_in">ToString</span>();</span><br><span class="line"> r->index_builder-><span class="built_in">AddIndexEntry</span>(&last_key, first_key_in_next_block_ptr,</span><br><span class="line"> r->pending_handle);</span><br><span class="line"> }</span><br><span class="line"> }</span><br></pre></td></tr></table></figure><p>在<code>EnterUnbuffered</code>函数中创建index block。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> (table_options.index_type ==</span><br><span class="line"> BlockBasedTableOptions::kTwoLevelIndexSearch) {</span><br><span class="line"> p_index_builder_ = PartitionedIndexBuilder::<span class="built_in">CreateIndexBuilder</span>(</span><br><span class="line"> &internal_comparator, use_delta_encoding_for_index_values,</span><br><span class="line"> table_options);</span><br><span class="line"> index_builder.<span class="built_in">reset</span>(p_index_builder_);</span><br><span class="line">} <span class="keyword">else</span> {</span><br><span class="line"> index_builder.<span class="built_in">reset</span>(IndexBuilder::<span class="built_in">CreateIndexBuilder</span>(</span><br><span class="line"> table_options.index_type, &internal_comparator,</span><br><span class="line"> &<span class="keyword">this</span>->internal_prefix_transform, use_delta_encoding_for_index_values,</span><br><span class="line"> table_options));</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>回到<code>ProcessKeyValueCompaction</code>中的while循环中,不断遍历迭代器中的key,将其添加到对应的datablock,并完善indeblock和filter block,以及compression block。</p><h3 id="通过构建的meta_index_builder和footer完成数据的固化"><a class="markdownIt-Anchor" href="#通过构建的meta_index_builder和footer完成数据的固化"></a> 通过构建的meta_index_builder和Footer完成数据的固化</h3><p>接下来将通过<code>FinishCompactionOutputFil</code>对之前添加的builder数据进行整合,处理一些delete range的block以及更新当前compaction的边界。<br />这个函数调用是当之前累计的builder中block数据的大小达到可以写入的sst文件本身的大小max_output_file_size,会触发当前函数。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Close output file if it is big enough. Two possibilities determine it's</span></span><br><span class="line"> <span class="comment">// time to close it: (1) the current key should be this file's last key, (2)</span></span><br><span class="line"> <span class="comment">// the next key should not be in this file.</span></span><br><span class="line"> <span class="comment">//</span></span><br><span class="line"> <span class="comment">// TODO(aekmekji): determine if file should be closed earlier than this</span></span><br><span class="line"> <span class="comment">// during subcompactions (i.e. if output size, estimated by input size, is</span></span><br><span class="line"> <span class="comment">// going to be 1.2MB and max_output_file_size = 1MB, prefer to have 0.6MB</span></span><br><span class="line"> <span class="comment">// and 0.6MB instead of 1MB and 0.2MB)</span></span><br><span class="line"> <span class="type">bool</span> output_file_ended = <span class="literal">false</span>;</span><br><span class="line"> <span class="keyword">if</span> (sub_compact->compaction-><span class="built_in">output_level</span>() != <span class="number">0</span> &&</span><br><span class="line"> sub_compact->current_output_file_size >=</span><br><span class="line"> sub_compact->compaction-><span class="built_in">max_output_file_size</span>()) {</span><br><span class="line"> <span class="comment">// (1) this key terminates the file. For historical reasons, the iterator</span></span><br><span class="line"> <span class="comment">// status before advancing will be given to FinishCompactionOutputFile().</span></span><br><span class="line"> output_file_ended = <span class="literal">true</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">TEST_SYNC_POINT_CALLBACK</span>(</span><br><span class="line"> <span class="string">"CompactionJob::Run():PausingManualCompaction:2"</span>,</span><br><span class="line"> <span class="built_in">reinterpret_cast</span><<span class="type">void</span>*>(</span><br><span class="line"> <span class="keyword">const_cast</span><std::atomic<<span class="type">int</span>>*>(manual_compaction_paused_)));</span><br><span class="line"> <span class="keyword">if</span> (partitioner.<span class="built_in">get</span>()) {</span><br><span class="line"> last_key_for_partitioner.<span class="built_in">assign</span>(c_iter-><span class="built_in">user_key</span>().data_,</span><br><span class="line"> c_iter-><span class="built_in">user_key</span>().size_);</span><br><span class="line"> }</span><br><span class="line"> c_iter-><span class="built_in">Next</span>();</span><br><span class="line"> <span class="keyword">if</span> (c_iter-><span class="built_in">status</span>().<span class="built_in">IsManualCompactionPaused</span>()) {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (!output_file_ended && c_iter-><span class="built_in">Valid</span>()) {</span><br><span class="line"> <span class="keyword">if</span> (((partitioner.<span class="built_in">get</span>() &&</span><br><span class="line"> partitioner-><span class="built_in">ShouldPartition</span>(<span class="built_in">PartitionerRequest</span>(</span><br><span class="line"> last_key_for_partitioner, c_iter-><span class="built_in">user_key</span>(),</span><br><span class="line"> sub_compact->current_output_file_size)) == kRequired) ||</span><br><span class="line"> (sub_compact->compaction-><span class="built_in">output_level</span>() != <span class="number">0</span> &&</span><br><span class="line"> sub_compact-><span class="built_in">ShouldStopBefore</span>(</span><br><span class="line"> c_iter-><span class="built_in">key</span>(), sub_compact->current_output_file_size))) &&</span><br><span class="line"> sub_compact->builder != <span class="literal">nullptr</span>) {</span><br><span class="line"> <span class="comment">// (2) this key belongs to the next file. For historical reasons, the</span></span><br><span class="line"> <span class="comment">// iterator status after advancing will be given to</span></span><br><span class="line"> <span class="comment">// FinishCompactionOutputFile().</span></span><br><span class="line"> output_file_ended = <span class="literal">true</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (output_file_ended) {</span><br><span class="line"> <span class="type">const</span> Slice* next_key = <span class="literal">nullptr</span>;</span><br><span class="line"> <span class="keyword">if</span> (c_iter-><span class="built_in">Valid</span>()) {</span><br><span class="line"> next_key = &c_iter-><span class="built_in">key</span>();</span><br><span class="line"> }</span><br><span class="line"> CompactionIterationStats range_del_out_stats;</span><br><span class="line"> status = <span class="built_in">FinishCompactionOutputFile</span>(input-><span class="built_in">status</span>(), sub_compact,</span><br><span class="line"> &range_del_agg, &range_del_out_stats,</span><br><span class="line"> next_key);</span><br><span class="line"> <span class="built_in">RecordDroppedKeys</span>(range_del_out_stats,</span><br><span class="line"> &sub_compact->compaction_job_stats);</span><br><span class="line"> }</span><br></pre></td></tr></table></figure><p><code>FinishCompactionOutputFile</code>函数内部最终调用s=sub_compact->builder->Finish()完成所有数据的固化写入。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">Status <span class="title">BlockBasedTableBuilder::Finish</span><span class="params">()</span> </span>{</span><br><span class="line"> Rep* r = rep_;</span><br><span class="line"> <span class="built_in">assert</span>(r->state != Rep::State::kClosed);</span><br><span class="line"> <span class="type">bool</span> empty_data_block = r->data_block.<span class="built_in">empty</span>();</span><br><span class="line"> r->first_key_in_next_block = <span class="literal">nullptr</span>;</span><br><span class="line"> <span class="built_in">Flush</span>();</span><br><span class="line"> <span class="keyword">if</span> (r->state == Rep::State::kBuffered) {</span><br><span class="line"> <span class="built_in">EnterUnbuffered</span>();</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (r-><span class="built_in">IsParallelCompressionEnabled</span>()) {</span><br><span class="line"> <span class="built_in">StopParallelCompression</span>();</span><br><span class="line"><span class="meta">#<span class="keyword">ifndef</span> NDEBUG</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="type">const</span> <span class="keyword">auto</span>& br : r->pc_rep->block_rep_buf) {</span><br><span class="line"> <span class="built_in">assert</span>(br.status.<span class="built_in">ok</span>());</span><br><span class="line"> }</span><br><span class="line"><span class="meta">#<span class="keyword">endif</span> <span class="comment">// !NDEBUG</span></span></span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">// To make sure properties block is able to keep the accurate size of index</span></span><br><span class="line"> <span class="comment">// block, we will finish writing all index entries first.</span></span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">ok</span>() && !empty_data_block) {</span><br><span class="line"> r->index_builder-><span class="built_in">AddIndexEntry</span>(</span><br><span class="line"> &r->last_key, <span class="literal">nullptr</span> <span class="comment">/* no next data block */</span>, r->pending_handle);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Write meta blocks, metaindex block and footer in the following order.</span></span><br><span class="line"> <span class="comment">// 1. [meta block: filter]</span></span><br><span class="line"> <span class="comment">// 2. [meta block: index]</span></span><br><span class="line"> <span class="comment">// 3. [meta block: compression dictionary]</span></span><br><span class="line"> <span class="comment">// 4. [meta block: range deletion tombstone]</span></span><br><span class="line"> <span class="comment">// 5. [meta block: properties]</span></span><br><span class="line"> <span class="comment">// 6. [metaindex block]</span></span><br><span class="line"> <span class="comment">// 7. Footer</span></span><br><span class="line"> BlockHandle metaindex_block_handle, index_block_handle;</span><br><span class="line"> MetaIndexBuilder meta_index_builder;</span><br><span class="line"> <span class="built_in">WriteFilterBlock</span>(&meta_index_builder);</span><br><span class="line"> <span class="built_in">WriteIndexBlock</span>(&meta_index_builder, &index_block_handle);</span><br><span class="line"> <span class="built_in">WriteCompressionDictBlock</span>(&meta_index_builder);</span><br><span class="line"> <span class="built_in">WriteRangeDelBlock</span>(&meta_index_builder);</span><br><span class="line"> <span class="built_in">WritePropertiesBlock</span>(&meta_index_builder);</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">ok</span>()) {</span><br><span class="line"> <span class="comment">// flush the meta index block</span></span><br><span class="line"> <span class="built_in">WriteRawBlock</span>(meta_index_builder.<span class="built_in">Finish</span>(), kNoCompression,</span><br><span class="line"> &metaindex_block_handle, BlockType::kMetaIndex);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">ok</span>()) {</span><br><span class="line"> <span class="built_in">WriteFooter</span>(metaindex_block_handle, index_block_handle);</span><br><span class="line"> }</span><br><span class="line"> r->state = Rep::State::kClosed;</span><br><span class="line"> r-><span class="built_in">SetStatus</span>(r-><span class="built_in">CopyIOStatus</span>());</span><br><span class="line"> Status ret_status = r-><span class="built_in">CopyStatus</span>();</span><br><span class="line"> <span class="built_in">assert</span>(!ret_status.<span class="built_in">ok</span>() || <span class="built_in">io_status</span>().<span class="built_in">ok</span>());</span><br><span class="line"> <span class="keyword">return</span> ret_status;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h2 id="compaction参数设置"><a class="markdownIt-Anchor" href="#compaction参数设置"></a> Compaction参数设置</h2><table><thead><tr><th>参数</th><th>说明</th><th>默认值</th></tr></thead><tbody><tr><td><code>write_buffer_size</code></td><td>限定Memtable的大小</td><td>64MB</td></tr><tr><td><code>level0_file_num_compaction_trigger</code></td><td>限定Level 0层的文件数量</td><td>4</td></tr><tr><td><code>target_file_size_base</code></td><td>每一层单个目标文件的大小</td><td>64MB</td></tr><tr><td><code>target_file_size_multiplier</code></td><td>每一层单个目标文件的乘法因子</td><td>1</td></tr><tr><td><code>max_bytes_for_level_base</code></td><td>每一层所有文件的大小</td><td>256MB</td></tr><tr><td><code>max_bytes_for_level_multiplier</code></td><td>每一层所有文件的乘法因子</td><td>10</td></tr><tr><td><code>level_compaction_dynamic_level_bytes</code></td><td>是否将Compact的策略改为层级从下往上应用</td><td>False</td></tr><tr><td><code>num_levels</code></td><td>LSM的层级数量</td><td>7</td></tr></tbody></table><ul><li><p>参数<code>target_file_size_base</code>和<code>target_file_size_multiplier</code>用来限定Compact之后的每一层的单个文件大小。<code>target_file_size_base</code>是Level-1中每个文件的大小,Level N层可以用<code>target_file_size_base * target_file_size_multiplier ^ (L -1)</code> 计算。<code>target_file_size_base</code> 默认为64MB,<code>target_file_size_multiplier</code>默认为1。</p></li><li><p>参数<code>max_bytes_for_level_base</code>和<code>max_bytes_for_level_multiplier</code>用来限定每一层所有文件的限定大小。 <code>max_bytes_for_level_base</code>是Level-1层的所有文件的限定大小。Level N层的所有文件的限定大小可以用 <code>(max_bytes_for_level_base) * (max_bytes_for_level_multiplier ^ (L-1))</code>计算。<code>max_bytes_for_level_base</code>的默认为256MB,<code>max_bytes_for_level_multiplier</code>默认为10。</p></li><li><p>参数<code>level_compaction_dynamic_level_bytes</code>用来指示Compact的策略改为层级从下往上应用。<code>Target_Size(Ln-1) = Target_Size(Ln) / max_bytes_for_level_multiplier</code>来限定大小:假如 <code>max_bytes_for_level_base</code>是 1GB, <code>num_levels</code>设为6。最底层的实际容量是276GB, 所以L1-L6层的大小分别是 0, 0, 0.276GB, 2.76GB, 27.6GB and 276GB。</p></li><li><p>MutableDBOptions</p></li></ul><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">struct</span> <span class="title class_">MutableDBOptions</span> {</span><br><span class="line"> <span class="function"><span class="type">static</span> <span class="type">const</span> <span class="type">char</span>* <span class="title">kName</span><span class="params">()</span> </span>{ <span class="keyword">return</span> <span class="string">"MutableDBOptions"</span>; }</span><br><span class="line"> <span class="built_in">MutableDBOptions</span>();</span><br><span class="line"> <span class="function"><span class="keyword">explicit</span> <span class="title">MutableDBOptions</span><span class="params">(<span class="type">const</span> MutableDBOptions& options)</span> </span>= <span class="keyword">default</span>;</span><br><span class="line"> <span class="function"><span class="keyword">explicit</span> <span class="title">MutableDBOptions</span><span class="params">(<span class="type">const</span> DBOptions& options)</span></span>;</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="type">void</span> <span class="title">Dump</span><span class="params">(Logger* log)</span> <span class="type">const</span></span>;</span><br><span class="line"></span><br><span class="line"> <span class="type">int</span> max_background_jobs;</span><br><span class="line"> <span class="type">int</span> base_background_compactions;</span><br><span class="line"> <span class="type">int</span> max_background_compactions;</span><br><span class="line"> <span class="type">uint32_t</span> max_subcompactions;</span><br><span class="line"> <span class="type">bool</span> avoid_flush_during_shutdown;</span><br><span class="line"> <span class="type">size_t</span> writable_file_max_buffer_size;</span><br><span class="line"> <span class="type">uint64_t</span> delayed_write_rate;</span><br><span class="line"> <span class="type">uint64_t</span> max_total_wal_size;</span><br><span class="line"> <span class="type">uint64_t</span> delete_obsolete_files_period_micros;</span><br><span class="line"> <span class="type">unsigned</span> <span class="type">int</span> stats_dump_period_sec;</span><br><span class="line"> <span class="type">unsigned</span> <span class="type">int</span> stats_persist_period_sec;</span><br><span class="line"> <span class="type">size_t</span> stats_history_buffer_size;</span><br><span class="line"> <span class="type">int</span> max_open_files;</span><br><span class="line"> <span class="type">uint64_t</span> bytes_per_sync;</span><br><span class="line"> <span class="type">uint64_t</span> wal_bytes_per_sync;</span><br><span class="line"> <span class="type">bool</span> strict_bytes_per_sync;</span><br><span class="line"> <span class="type">size_t</span> compaction_readahead_size;</span><br><span class="line"> <span class="type">int</span> max_background_flushes;</span><br><span class="line">};</span><br></pre></td></tr></table></figure><ul><li>mutable_cf_options_</li></ul><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">explicit</span> <span class="title">MutableCFOptions</span><span class="params">(<span class="type">const</span> ColumnFamilyOptions& options)</span></span></span><br><span class="line"><span class="function"> : write_buffer_size(options.write_buffer_size),</span></span><br><span class="line"><span class="function"> max_write_buffer_number(options.max_write_buffer_number),</span></span><br><span class="line"><span class="function"> arena_block_size(options.arena_block_size),</span></span><br><span class="line"><span class="function"> memtable_prefix_bloom_size_ratio(</span></span><br><span class="line"><span class="function"> options.memtable_prefix_bloom_size_ratio),</span></span><br><span class="line"><span class="function"> memtable_whole_key_filtering(options.memtable_whole_key_filtering),</span></span><br><span class="line"><span class="function"> memtable_huge_page_size(options.memtable_huge_page_size),</span></span><br><span class="line"><span class="function"> max_successive_merges(options.max_successive_merges),</span></span><br><span class="line"><span class="function"> inplace_update_num_locks(options.inplace_update_num_locks),</span></span><br><span class="line"><span class="function"> prefix_extractor(options.prefix_extractor),</span></span><br><span class="line"><span class="function"> disable_auto_compactions(options.disable_auto_compactions),</span></span><br><span class="line"><span class="function"> soft_pending_compaction_bytes_limit(</span></span><br><span class="line"><span class="function"> options.soft_pending_compaction_bytes_limit),</span></span><br><span class="line"><span class="function"> hard_pending_compaction_bytes_limit(</span></span><br><span class="line"><span class="function"> options.hard_pending_compaction_bytes_limit),</span></span><br><span class="line"><span class="function"> level0_file_num_compaction_trigger(</span></span><br><span class="line"><span class="function"> options.level0_file_num_compaction_trigger),</span></span><br><span class="line"><span class="function"> level0_slowdown_writes_trigger(options.level0_slowdown_writes_trigger),</span></span><br><span class="line"><span class="function"> level0_stop_writes_trigger(options.level0_stop_writes_trigger),</span></span><br><span class="line"><span class="function"> max_compaction_bytes(options.max_compaction_bytes),</span></span><br><span class="line"><span class="function"> target_file_size_base(options.target_file_size_base),</span></span><br><span class="line"><span class="function"> target_file_size_multiplier(options.target_file_size_multiplier),</span></span><br><span class="line"><span class="function"> max_bytes_for_level_base(options.max_bytes_for_level_base),</span></span><br><span class="line"><span class="function"> max_bytes_for_level_multiplier(options.max_bytes_for_level_multiplier),</span></span><br><span class="line"><span class="function"> ttl(options.ttl),</span></span><br><span class="line"><span class="function"> periodic_compaction_seconds(options.periodic_compaction_seconds),</span></span><br><span class="line"><span class="function"> max_bytes_for_level_multiplier_additional(</span></span><br><span class="line"><span class="function"> options.max_bytes_for_level_multiplier_additional),</span></span><br><span class="line"><span class="function"> compaction_options_fifo(options.compaction_options_fifo),</span></span><br><span class="line"><span class="function"> compaction_options_universal(options.compaction_options_universal),</span></span><br><span class="line"><span class="function"> enable_blob_files(options.enable_blob_files),</span></span><br><span class="line"><span class="function"> min_blob_size(options.min_blob_size),</span></span><br><span class="line"><span class="function"> blob_file_size(options.blob_file_size),</span></span><br><span class="line"><span class="function"> blob_compression_type(options.blob_compression_type),</span></span><br><span class="line"><span class="function"> enable_blob_garbage_collection(options.enable_blob_garbage_collection),</span></span><br><span class="line"><span class="function"> blob_garbage_collection_age_cutoff(</span></span><br><span class="line"><span class="function"> options.blob_garbage_collection_age_cutoff),</span></span><br><span class="line"><span class="function"> max_sequential_skip_in_iterations(</span></span><br><span class="line"><span class="function"> options.max_sequential_skip_in_iterations),</span></span><br><span class="line"><span class="function"> check_flush_compaction_key_order(</span></span><br><span class="line"><span class="function"> options.check_flush_compaction_key_order),</span></span><br><span class="line"><span class="function"> paranoid_file_checks(options.paranoid_file_checks),</span></span><br><span class="line"><span class="function"> report_bg_io_stats(options.report_bg_io_stats),</span></span><br><span class="line"><span class="function"> compression(options.compression),</span></span><br><span class="line"><span class="function"> bottommost_compression(options.bottommost_compression),</span></span><br><span class="line"><span class="function"> compression_opts(options.compression_opts),</span></span><br><span class="line"><span class="function"> bottommost_compression_opts(options.bottommost_compression_opts),</span></span><br><span class="line"><span class="function"> bottommost_temperature(options.bottommost_temperature),</span></span><br><span class="line"><span class="function"> sample_for_compression(</span></span><br><span class="line"><span class="function"> options.sample_for_compression) {</span> <span class="comment">// <span class="doctag">TODO:</span> is 0 fine here?</span></span><br><span class="line"> <span class="built_in">RefreshDerivedOptions</span>(options.num_levels, options.compaction_style);</span><br><span class="line"> }</span><br></pre></td></tr></table></figure><h2 id="some-concepts"><a class="markdownIt-Anchor" href="#some-concepts"></a> Some Concepts</h2><ul><li><strong>Slice</strong> is a simple structure containing a pointer into some external storage and a size.</li><li><strong>parents</strong> && <strong>grandparents</strong>: parent=level+1 grandparent==level+2</li><li><strong>column family</strong>(cfd)</li><li><strong>compaction filter</strong></li><li><strong>compression</strong></li><li><strong>sst file maneger</strong>(sfm)</li><li><strong>background</strong>(bg)</li></ul><h2 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h2><ul><li><a href="https://github.com/facebook/rocksdb/wiki/Compaction">RocksDB Compaction Wiki</a></li><li><a href="https://blog.csdn.net/Z_Stand/article/details/106959058">Rocksdb Compaction 源码详解(一):SST文件详细格式源码解析</a></li><li><a href="https://vigourtyy-zhg.blog.csdn.net/article/details/107592966?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EOPENSEARCH%7Edefault-6.no_search_link&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EOPENSEARCH%7Edefault-6.no_search_link?%7C">Rocksdb Compaction源码详解(二):Compaction 完整实现过程 概览</a></li><li><a href="http://rocksdb.org/blog/2015/07/23/dynamic-level.html">Dynamic Level Size for Level-Based Compaction</a></li><li><a href="https://blog.csdn.net/weixin_31951239/article/details/113019578">通过base level减少space amplification</a></li><li><a href="https://www.leviathan.vip/2018/03/05/Rocksdb%E7%9A%84Compact/">RocksDB的Compact</a></li><li><a href="http://rocksdb.org/blog/2016/01/29/compaction_pri.html">compaction_pri</a></li><li><a href="https://github.com/facebook/rocksdb/wiki/Compaction-Filter">compaction filter</a></li></ul>]]></content>
<summary type="html"><p> RocksDB的Compaction过程整体可分为三个部分,prepare keys、process keys、write keys。</p></summary>
<category term="Embedded" scheme="https://tong1heng.github.io/categories/Embedded/"/>
<category term="cpp" scheme="https://tong1heng.github.io/tags/cpp/"/>
<category term="RocksDB" scheme="https://tong1heng.github.io/tags/RocksDB/"/>
<category term="compaction" scheme="https://tong1heng.github.io/tags/compaction/"/>
</entry>
<entry>
<title>MySQL Buffer Pool Design</title>
<link href="https://tong1heng.github.io/2021/09/07/Embedded/mysql_buffer_pool_design/"/>
<id>https://tong1heng.github.io/2021/09/07/Embedded/mysql_buffer_pool_design/</id>
<published>2021-09-07T13:00:00.000Z</published>
<updated>2021-10-20T14:55:25.251Z</updated>
<content type="html"><![CDATA[<p>MySQL存储引擎InnoDB的buffer pool设计思路。</p><span id="more"></span><h2 id="mysqlinnodb-buffer-pool"><a class="markdownIt-Anchor" href="#mysqlinnodb-buffer-pool"></a> MySQL(InnoDB) buffer pool</h2><h3 id="配置参数"><a class="markdownIt-Anchor" href="#配置参数"></a> 配置参数</h3><ul><li><p><code>innodb_buffer_pool_size</code>: buffer pool大小</p></li><li><p><code>innodb_buffer_pool_instances</code>: buffer pool实例个数(若bufferpool较大,可划分为多个instances,每个instance通过各自的list独立管理,提高读并发度)</p></li><li><p><code>innodb_buffer_pool_chunk_size</code>: 当增加或减少<code>innodb_buffer_pool_size</code>时,<code>innodb_buffer_pool_chunk_size</code>相应变化</p></li></ul><blockquote><p>If the new innodb_buffer_pool_chunk_size value * innodb_buffer_pool_instances is larger than the current buffer pool size when the buffer pool is initialized, innodb_buffer_pool_chunk_size is truncated to innodb_buffer_pool_size / innodb_buffer_pool_instances.</p></blockquote><blockquote><p>Buffer pool size must always be equal to or a multiple of innodb_buffer_pool_chunk_size * innodb_buffer_pool_instances. If you alter innodb_buffer_pool_chunk_size, innodb_buffer_pool_size is automatically adjusted to a value that is equal to or a multiple of innodb_buffer_pool_chunk_size * innodb_buffer_pool_instances. The adjustment occurs when the buffer pool is initialized.</p></blockquote><ul><li><code>innodb_old_blocks_pct</code>: controls the percentage of “old” blocks in the LRU list(LRU链表中插入点的位置)</li></ul><h3 id="替换策略变种lru"><a class="markdownIt-Anchor" href="#替换策略变种lru"></a> 替换策略:变种LRU</h3><ul><li><p>普通LRU会产生的问题:预读失效和缓冲池污染。</p><ul><li>预读失效:预先加载的一些page后续没有被访问,反而丢弃了原本LRU链表末尾的一些page。</li><li>缓冲池污染:一次性扫描大量数据,buffer pool中所有page被替换出去。</li></ul></li><li><p>解决方案:冷热数据分离</p><ul><li>将LRU链表分为两部分,热数据区和冷数据区。</li><li>当某一page第一次被加载到buffer pool中,先将其放到冷数据区域的链表头部。</li><li>经过<code>innodb_old_blocks_time</code>(单位:ms)后,若该page再次被访问,将其移动到热数据区域的链表头部。</li><li>若page已经在热数据区,再次被访问,不需要每次都移动到热数据区链表头部,MySQL的优化方案是,热数据区的后3/4部分被访问需要移动到链表头部,前1/4部分不移动。</li></ul></li></ul><h3 id="lru链表"><a class="markdownIt-Anchor" href="#lru链表"></a> LRU链表</h3><p><img src="https://dev.mysql.com/doc/refman/5.7/en/images/innodb-buffer-pool-list.png" alt="" /></p><ul><li>分为两个部分:New Sublist,Old Sublist。</li><li><code>innodb_old_blocks_pct</code>控制插入点Midpoint。</li><li>全表扫描时,设置<code>innodb_old_blocks_time</code>的时间窗口可以有效的保护New Sublist。</li></ul><h3 id="预读机制"><a class="markdownIt-Anchor" href="#预读机制"></a> 预读机制</h3><ul><li>Linear read-ahead</li><li>Random read-ahead</li></ul><h3 id="api"><a class="markdownIt-Anchor" href="#api"></a> API</h3><ul><li><p><a href="https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/include/buf0buf.h">buf0buf.h</a>: The database buffer pool high-level routines</p><ul><li><code>dberr_t buf_pool_init(ulint total_size, ulint n_instances)</code>: Creates the buffer pool.</li><li><code>void buf_pool_free_all()</code>: Frees the buffer pool at shutdown.</li><li><code>void buf_resize_thread()</code>: This is the thread for resizing buffer pool.</li><li><code>void buf_pool_clear_hash_index(void)</code>: Clears the adaptive hash index on all pages in the buffer pool.</li><li><code>static inline ulint buf_pool_get_curr_size(void)</code>: Gets the current size of buffer buf_pool in bytes.</li><li><code>static inline ulint buf_pool_get_n_pages(void)</code>: Gets the current size of buffer buf_pool in frames.</li><li>get<ul><li><code>bool buf_page_optimistic_get(ulint rw_latch, buf_block_t *block, uint64_t modify_clock, Page_fetch fetch_mode, const char *file, ulint line, mtr_t *mtr)</code>: Get optimistic access to a database page.</li><li><code>bool buf_page_get_known_nowait(ulint rw_latch, buf_block_t *block, Cache_hint hint, const char *file, ulint line, mtr_t *mtr)</code>: Get access to a known database page, when no waiting can be done.</li><li><code>const buf_block_t *buf_page_try_get_func(const page_id_t &page_id, const char *file, ulint line, mtr_t *mtr)</code>: Given a tablespace id and page number tries to get that page.</li><li><code>buf_block_t *buf_page_get_gen(const page_id_t &page_id, const page_size_t &page_size, ulint rw_latch, buf_block_t *guess, Page_fetch mode, const char *file, ulint line, mtr_t *mtr, bool dirty_with_no_latch = false)</code>: Get access to a database page.</li><li><code>buf_block_t *buf_page_create(const page_id_t &page_id, const page_size_t &page_size, rw_lock_type_t rw_latch, mtr_t *mtr)</code>: Initializes a page to the buffer buf_pool.</li></ul></li><li><code>void buf_page_make_young(buf_page_t *bpage)</code>: Moves a page to the start of the buffer pool LRU list.</li><li><code>void buf_page_make_old(buf_page_t *bpage)</code>: Moved a page to the end of the buffer pool LRU list.</li><li><code>static inline ibool buf_page_peek(const page_id_t &page_id)</code>: Returns TRUE if the page can be found in the buffer pool hash table.</li></ul></li><li><p><a href="https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/include/buf0dblwr.h">buf0dblwr.h</a>: Doublewrite buffer module</p></li><li><p><a href="https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/include/buf0rea.h">buf0rea.h</a>: The database buffer read</p></li><li><p><a href="https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/include/buf0dump.h">buf0dump.h</a>: Implements a buffer pool dump/load</p></li><li><p><a href="https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/include/buf0flu.h">buf0flu.h</a>: The database buffer pool flush algorithm</p></li><li><p><a href="https://github.com/mysql/mysql-server/blob/8.0/storage/innobase/include/buf0lru.h">buf0lru.h</a>: The database buffer pool LRU replacement algorithm</p></li></ul><h3 id="reference"><a class="markdownIt-Anchor" href="#reference"></a> Reference</h3><ul><li><a href="https://github.com/mysql/mysql-server">MySQL源码</a><ul><li><a href="https://github.com/mysql/mysql-server/tree/8.0/storage/innobase/include">buffer pool API声明</a></li><li><a href="https://github.com/mysql/mysql-server/tree/8.0/storage/innobase/buf">buffer pool API实现</a></li></ul></li><li><a href="https://dev.mysql.com/doc/refman/5.7/en/innodb-storage-engine.html">MySQL InnoDB文档</a></li></ul>]]></content>
<summary type="html"><p>MySQL存储引擎InnoDB的buffer pool设计思路。</p></summary>
<category term="Embedded" scheme="https://tong1heng.github.io/categories/Embedded/"/>
<category term="cpp" scheme="https://tong1heng.github.io/tags/cpp/"/>
<category term="MySQL" scheme="https://tong1heng.github.io/tags/MySQL/"/>
<category term="buffer pool" scheme="https://tong1heng.github.io/tags/buffer-pool/"/>
</entry>
<entry>
<title>What can we learn from MIT's education?</title>
<link href="https://tong1heng.github.io/2021/04/20/Secret/mit/"/>
<id>https://tong1heng.github.io/2021/04/20/Secret/mit/</id>
<published>2021-04-20T08:40:46.000Z</published>
<updated>2023-03-19T11:36:28.026Z</updated>
<content type="html"><![CDATA[<div class="hbe hbe-container" id="hexo-blog-encrypt" data-wpm="Oh, this is an invalid password. Check and try again, please." data-whm="OOPS, these decrypted content may changed, but you can still have a look."> <script id="hbeData" type="hbeData" data-hmacdigest="1684418ef9e85146ef0b4631e68e989d638109a52ccb7cd5a2b4dd136fbfbd64">1a1d9abe5672c7bc37baaec17a6e5159f262867d797e9ec6be62be8508e8af4643c33ca0e32a5d1616085fe6db39be82dcea421b56e55196f68f3700abed5d1999ecc17075d2fbf304b6ac9ad95ad488a6657b2b8b8288651be488b85ee3947f5d558e0d5123633fc90a754b6ad99fdfb944b527cec8a02e820de70f0da60d600ac09c431d761cfe7acfd931cd569cdfbb8ea2f8e47b4f67ddb99aec5c0886a2</script> <div class="hbe hbe-content"> <div class="hbe hbe-input hbe-input-xray"> <input class="hbe hbe-input-field hbe-input-field-xray" type="password" id="hbePass"> <label class="hbe hbe-input-label hbe-input-label-xray" for="hbePass"> <span class="hbe hbe-input-label-content hbe-input-label-content-xray">输入密码,查看文章</span> </label> <svg class="hbe hbe-graphic hbe-graphic-xray" width="300%" height="100%" viewBox="0 0 1200 60" preserveAspectRatio="none"> <path d="M0,56.5c0,0,298.666,0,399.333,0C448.336,56.5,513.994,46,597,46c77.327,0,135,10.5,200.999,10.5c95.996,0,402.001,0,402.001,0"></path> <path d="M0,2.5c0,0,298.666,0,399.333,0C448.336,2.5,513.994,13,597,13c77.327,0,135-10.5,200.999-10.5c95.996,0,402.001,0,402.001,0"></path> </svg> </div> </div></div><script data-pjax src="/lib/hbe.js"></script><link href="/css/hbe.style.css" rel="stylesheet" type="text/css">]]></content>
<summary type="html">At MIT, we revel in a culture of learning by doing. In 30 departments across five schools and one college, our students combine analytical rigor with curiosity, playful imagination, and an appetite for solving the hardest problems in service to society.</summary>
<category term="Secret" scheme="https://tong1heng.github.io/categories/Secret/"/>
</entry>
</feed>