Skip to content

Commit 21e2b5d

Browse files
committed
Updated algorithm paper with the latest changes - thats it for now
1 parent a6426c3 commit 21e2b5d

File tree

2 files changed

+10
-12
lines changed

2 files changed

+10
-12
lines changed

doc/source/algorithm.rst

+8-10
Original file line numberDiff line numberDiff line change
@@ -58,10 +58,10 @@ GitDB's reverse delta aggregation algorithm
5858
===========================================
5959
The idea of this algorithm is to merge all delta streams into one, which can then be applied in just one go.
6060

61-
In the current implementation, delta streams are parsed into DeltaChunks (->**DC**), which are kept in vectors. Each DC represents one copy-from-base operation, or one or multiple consecutive add-bytes operations. DeltaChunks know about their target offset in the target buffer, and their size. Their target offsets are consecutive, i.e. one chunk ends where the next one begins, regarding their logical extend in the target buffer.
61+
In the current implementation, delta streams are parsed into DeltaChunks (->**DC**). Each DC represents one copy-from-base operation, or one or multiple consecutive add-bytes operations. DeltaChunks know about their target offset in the target buffer, and their size. Their target offsets are consecutive, i.e. one chunk ends where the next one begins, regarding their logical extend in the target buffer.
6262
Add-bytes DCs additional store their data to apply, copy-from-base DCs store the offset into the base buffer from which to copy bytes.
6363

64-
During processing, one starts with the latest (i.e. topmost) delta stream (->**TDS**), and iterates through its ancestor delta streams (->ADS) to merge them into the growing toplevel delta stream.
64+
During processing, one starts with the latest (i.e. topmost) delta stream (->**TDS**), and iterates through its ancestor delta streams (->ADS) to merge them into the growing toplevel delta stream..
6565

6666
The merging works by following a set of rules:
6767
* Merge into the top-level delta from the youngest ancestor delta to the oldest one
@@ -72,28 +72,26 @@ The merging works by following a set of rules:
7272

7373
* Finish the merge once all ADS have been handled, or once the TDS only consists of add-byte DCs. The remaining copy-from-base DCs will copy from the original base buffer accordingly.
7474

75-
Applying the TDS is as straightforward as applying any other DS. The base buffer is required to be kept in memory. In the current implementation, a full-size target buffer is allocated to hold the result of applying the chunk information.
75+
Applying the TDS is as straightforward as applying any other DS. The base buffer is required to be kept in memory. In the current implementation, a full-size target buffer is allocated to hold the result of applying the chunk information. Here it is already possible to stream the result, which is feasible only if the memory of the base buffer + the memory of the TDS are smaller than a full size target buffer. Streaming will always make sense if the peak resulting from having the base, target and TDS buffers in memory together is unaffordable.
7676

77-
The memory consumption during the TDS processing are the uncompressed delta-bytes, the parsed DS, as well as the TDS. Afterwards one requires an allocated base buffer, the target buffer, as well as the TDS.
78-
It is clearly visible that the current implementation does not at all reduce memory consumption, but the opposite is true as the TDS can be large for large files.
77+
The memory consumption during the TDS processing is only the condensed delta-bytes, for each ADS an additional index is required which costs 8 byte per DC. When applying the TDS, one requires an allocated base buffer too.The target buffer can be allocated, but may be a writer as well.
7978

8079
Performance Results
8180
-------------------
8281
The benchmarking context was the same as for the brute-force GitDB algorithm. This implementation is far more complex than the said brute-force implementation, which clearly reflects in the numbers. It's pure-python throughput is at only 1.1 MiB/s, which equals 89 streams/s.
8382
The biggest performance bottleneck is the slicing of the parsed delta streams, where the program spends most of its time due to hundred thousands of calls.
8483

8584
To get a more usable version of the algorithm, it was implemented in C, such that python must do no more than two calls to get all the work done. The first prepares the TDS, the second applies it, writing it into a target buffer.
86-
The throughput reaches 16.7 MiB/s, which equals 1344 streams/s, which makes it 15 times faster than the pure python version, and amazingly even 1.5 times faster than the brute-force C implementation. As a comparison, cgit is able to stream about 20 MiB when controlling it through a pipe. GitDBs performance may still improve once pack access is reimplemented in C as well.
85+
The throughput reaches 15.2 MiB/s, which equals 1221 streams/s, which makes it nearly 14 times faster than the pure python version, and amazingly even 1.35 times faster than the brute-force C implementation. As a comparison, cgit is able to stream about 20 MiB when controlling it through a pipe. GitDBs performance may still improve once pack access is reimplemented in C as well.
8786

88-
All this comes at a relatively high memory consumption.Additionally, with each new level being merged, not only are more DCs inserted, but the new chunks may get smaller as well. This can reach a point where one chunk only represents an individual byte, so the size of the data structure outweighs the logical chunk size by far.
89-
90-
A 125 MB file took 3.1 seconds to unpack for instance, which is only 33% slower than the c implementation of the brute-force algorithm.
87+
A 125 MB file took 2.5 seconds to unpack for instance, which is only 20% slower than the c implementation of the brute-force algorithm.
9188

9289

9390
Future work
9491
===========
95-
The current implementation of the reverse delta aggregation algorithm is already working well and fast, but leaves room for improvement in the realm of its memory consumption. One way to considerably reduce it would be to index the delta stream to determine bounds, instead of parsing it into a separate data structure
9692

9793
Another very promising option is that streaming of delta data is indeed possible. Depending on the configuration of the copy-from-base operations, different optimizations could be applied to reduce the amount of memory required for the final processed delta stream. Some configurations may even allow it to stream data from the base buffer, instead of pre-loading it for random access.
9894

9995
The ability to stream files at reduced memory costs would only be feasible for big files, and would have to be payed with extra pre-processing time.
96+
97+
A very first and simple implementation could avoid memory peaks by streaming the TDS in conjunction with a base buffer, instead of writing everything into a fully allocated target buffer.

test/performance/test_pack.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313

1414
class TestPackedDBPerformance(TestBigRepoR):
1515

16-
def test_pack_random_access(self):
16+
def _test_pack_random_access(self):
1717
pdb = PackedDB(os.path.join(self.gitrepopath, "objects/pack"))
1818

1919
# sha lookup
@@ -61,7 +61,7 @@ def test_pack_random_access(self):
6161
total_kib = total_size / 1000
6262
print >> sys.stderr, "PDB: Obtained %i streams by sha and read all bytes totallying %i KiB ( %f KiB / s ) in %f s ( %f streams/s )" % (max_items, total_kib, total_kib/elapsed , elapsed, max_items / elapsed)
6363

64-
def _disabled_test_correctness(self):
64+
def test_correctness(self):
6565
pdb = PackedDB(os.path.join(self.gitrepopath, "objects/pack"))
6666
# disabled for now as it used to work perfectly, checking big repositories takes a long time
6767
print >> sys.stderr, "Endurance run: verify streaming of objects (crc and sha)"

0 commit comments

Comments
 (0)