Skip to content

Commit

Permalink
Update README for Xcode 15
Browse files Browse the repository at this point in the history
  • Loading branch information
saagarjha committed Jul 16, 2023
1 parent 350635c commit 2a87017
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# unxip

unxip is a command-line tool designed for rapidly unarchiving Xcode XIP files and writing them to disk with good compression. Its goal is to outperform Bom (which powers `xip(1)` and Archive Utility) in both performance and on-disk usage, and (at the time of writing) does so by a factor of about 2-3x in time spent decompressing and about 8% in space.
unxip is a command-line tool designed for rapidly unarchiving Xcode XIP files and writing them to disk with good compression. Its goal is to outperform Bom (which powers `xip(1)` and Archive Utility) in both performance and on-disk usage, and (at the time of writing) does so by a factor of about 2-3x in time spent decompressing and about 25% in space.

## Installation

Expand Down Expand Up @@ -67,7 +67,7 @@ Parsing this cpio archive gives the necessary information need to reconstruct an

On the whole, this procedure is designed to be fairly linear, with the XIP being read sequentially, producing LZMA chunks that are reassembled in order to create the cpio archive, which can then be streamed to reconstruct an Xcode bundle. Unfortunately, a naive implementation of this is process does not perform very well due to the varying performance bottlenecks of each step. To make matters worse, the size of Xcode makes it infeasible to operate with entirely in memory. To get around this problem, unxip parallelizes *intermediate* steps and then streams results in linear order, benefiting from much better processor utilization and allowing the file to be processed in "sliding window" fashion.

On modern processors, single-threaded LZMA decoding is limited to about ~100 MB/s; as the Xcode cpio is almost 40 GB large, this is not really fast enough for unxip. Instead, unxip carves out each chunk from the pbzx archive into its own task (the metadata in the file format makes this fairly straightforward) and decompresses each in parallel. To limit memory usage, a cap is applied to how many chunks are resident in memory at once. Since the next step (parsing the cpio) requires logical linearity, completing chunks are temporarily parked until their preceding ones complete, after which they are all yielded together. This preserves order while still providing an opportunity for multiple chunks to be decoded in parallel. In practice, this technique can decode the LZMA stream at effective speeds approaching 1 GB/s when provided with enough CPU cores.
On modern processors, single-threaded LZMA decoding is limited to about ~100 Mb/s; as the Xcode 15 cpio over 10 GB (and the Xcode 13 almost 40!), this is not really fast enough for unxip. Instead, unxip carves out each chunk from the pbzx archive into its own task (the metadata in the file format makes this fairly straightforward) and decompresses each in parallel. To limit memory usage, a cap is applied to how many chunks are resident in memory at once. Since the next step (parsing the cpio) requires logical linearity, completing chunks are temporarily parked until their preceding ones complete, after which they are all yielded together. This preserves order while still providing an opportunity for multiple chunks to be decoded in parallel. In practice, this technique can decode the LZMA stream at effective speeds beyond 1 Gb/s when provided with enough CPU cores.

The linear chunk stream (now decompressed into a cpio) is then parsed in sequence to extract files, directories, and their associated metadata. cpios are naturally ordered–for example, all additional hardlinks must come after the original file–but Xcode's has an additional nice property that it's been sorted so that all directories appear before the files inside of them. This allows for a sequential stream of filesystem operations to correctly produce the bundle, without running into errors with missing intermediate directories or link targets.

Expand Down

0 comments on commit 2a87017

Please sign in to comment.