Skip to content

Latest commit

 

History

History
484 lines (389 loc) · 19.4 KB

File metadata and controls

484 lines (389 loc) · 19.4 KB

Wicked Good Unpacker extension

This is a fork of the bundled Chrome OS ZIP Unpacker extension. It enables support for a wide variety of archive and compression formats. It also supports files that have only been compressed (e.g. foo.gz). All of this is thanks to the great libarchive project.

You can install it via the CWS: https://chrome.google.com/webstore/detail/mljpablpddhocfbnokacjggdbmafjnon

Supported Formats

Note that we support archives (compressed or uncompressed), and we support single compressed files (that have no archiving, e.g. foo.gz).

Here's the list of supported archive formats:

  • 7z: The 7-Zip format.
  • ar: Simple UNIX archive, usually for developers.
  • cab: Microsoft's cabinet archive format.
  • cpio: Classic UNIX archive that still shows up, but has been largely replaced by tarballs.
  • crx: Google Chrome extensions.
  • deb: Debian package archive format used by Linux distros based on Debian (like Ubuntu).
  • iso: ISO 9660 CD disk images. Note: UDF DVD disk images are not supported.
  • jar: Java ARchives used by programmers.
  • lha/lzh: LHA and LZH are common formats in Japan, and used by many old school video games.
  • mtree: The BSD mtree format for mapping a directory tree.
  • pax: Portable Archive Exchange format is a UNIX format meant to replace cpio and tar.
  • rar: RAR archives produced by WinRAR. This is mostly for testing, so the native CrOS support should be used by default instead.
  • rpm: RPM Package Manager archive used by Linux distros like RedHat, Fedora, CentOS, SUSE, and more.
  • tar: UNIX tarballs that are common in the Linux computing world.
  • warc: The Web ARChive format for archiving web sites.
  • zip: The venerable ZIP. This is mostly for testing, so the native CrOS support should be used by default instead.

Here's the list of supported compression/encoding formats:

  • bz2/bzip/bzip2: The bzip2 compression format common in the UNIX world.
  • gz/gzip: The gzip compression format based on zlib. Common in the UNIX world, although zlib is also used in many places.
  • lzma: The lzma compression format that has been largely replaced by xz. The compression algorithm is used by other formats, but the standalone format is not.
  • lz4: The LZ4 compression algorithm.
  • lzip: The lzma compression algorithm in the lzip format.
  • lzop: The LZO compression algorithm in the lzop format.
  • uu: The unix-to-unix text encoding format.
  • xz: The xz compression format that is common in the UNIX world.
  • Z: The compress legacy format that still shows up at random.
  • zstd: The Zstandard algorithm developed by Facebook.

Known Issues

Speed/Performance

Most archive formats don't include an index. This means we need to decompress the entire file just to get a directory listing. The formats allow any ordering by design. For example, it could be ./bar.txt, ./foo/blah.txt, ./asdf.txt. Or it could be ./asdf.txt, ./foo/blah.txt, and ./bar.txt. The only way we can produce a complete directory listing is by looking through the entire file.

This slows things down overall (like in tarballs) and there isn't much that can be done about it.

However, there are a some file formats that do have indexes and we don't (yet) support using those. 7-zip is the most notable one here.

A similar issue comes up with single compressed files. Many formats do not know the uncompressed file size, so the only way to calculate it is by decompressing the entire file. If we were to report a fake file size (like zero bytes, or a really large file size) to the Files app, it wouldn't be able to copy the result out. It would try to read the number of bytes that it was told were available. For the few formats that do include the uncompressed size in their header (like the gzip format), we can skip the decompression overhead.

Passwords

Some formats can be encrypted with passwords, but we don't prompt the user, so the files aren't decrypted. Oops.

Spanning Archives

Some formats can span multiple files, but we don't yet support those.

RAR Support

"It's complicated."

The WGU extension doesn't support the RAR format today. Chrome OS supports it natively via cros-disks -> AVFS -> official unrar program. We can't replace that stack until we have comparable coverage.

The RAR format has gone through a number of major revisions (at least 5 so far). A smart Russian came up with it long ago and continues to develop it as a company (RARLAB). It's a proprietary format and, while some code has been released by them, they are hostile to reverse engineering. As such, only the v1, v2, and v3 formats are supported. Unfortunately, v4 and v5 formats are common and users tend to use those more.

There is an open source unrar library released by RARLAB, but the API is not documented, and its runtime model does not mesh well with libarchive's runtime model. It's possible, but it's not trivial.

Chrome OS Bundling

Sometimes people ask, since WGU is based on the official Chrome OS Zip unpacker that is bundled with Chrome OS today, why don't we just merge the two so that Chrome OS supports everything WGU does out of the box?

"It's complicated."

From the product team's perspective, they don't want to support an extensive set of formats if there is not high user demand for them. If users run into problems (and they inevitably will), the engineering costs aren't justified.

Similarly, they don't want to say "ZIP is officially supported, but all other formats are 'best effort'". Most users don't care about those trade-offs -- they just want their system to work. All they see is that they tried to open a 7z file and it didn't work even though opening a different 7z file worked. Trying to explain these nuances doesn't really scale.

Thus the status quo is to not support the formats at all. Users can try and locate alternatives (like WGU), and in the process of doing so, understand that the resulting software might be buggy. And those bugs are not the fault of the Chrome OS product (although some will still complain that Chrome OS should have included support out of the box).

Everyone has a reasonable position taken in isolation. But the end result is that everyone loses. Offering best-effort support makes users unhappy, but offering nothing also makes them unhappy. At least this way, the blow back on the Chrome OS product is lower.

Bug reports / Feature requests

Please use the issues link here to report any issues you might run into.

ZIP Unpacker extension

This is the ZIP Unpacker extension used in Chrome OS to support reading and unpacking of zip archives.

Build steps

NaCl SDK

Since the code is built with NaCl, you'll need its toolchain.

$ cd third-party
$ make nacl_sdk

Webports (a.k.a. NaCl ports)

We'll use libraries from webports.

$ cd third-party
$ make depot_tools
$ make webports

npm Setup

First install npm using your normal packaging system. On Debian, you'll want something like:

$ sudo apt-get install npm

Your distro might have old versions of npm, so you'd have to install it yourself.

Then install the npm modules that we require. Do this in the root of the unpacker repo.

$ npm install bower vulcanize crisper

Unpacker Build

Once done, install the libarchive-fork/ from third-party/ of the unpacker project. Note that you cannot use libarchive nor libarchive-dev packages from webports at this moment, as not all patches in the fork are upstreamed.

$ cd third-party
$ make libarchive-fork

Polymer is used for UI. In order to fetch it, in the same directory type:

$ make polymer

Build the PNaCl module.

$ cd unpacker
$ make [debug]

Use

The package can be found in the release or debug directory. You can run it directly from there using Chrome's "Load unpacked extension" feature, or you can zip it up for posting to the Chrome Web Store.

$ zip -r release.zip release/

Once it's loaded, you should be able to open ZIP archives in the Files app.

Source Layout

Paths that aren't linked below are dynamically created at build time.

  • node_modules/: All the locally installed npm modules used for building.
  • third-party/: The source for third-party NaCl & Polymer code.
  • unpacker/: The extension CSS/HTML/JS/NaCl source code.
    • cpp/: The NaCl module source.
    • css/: Any CSS needed for styling UI.
    • debug/: A debug build of the Chrome extension.
    • html/: Any HTML needed for UI.
    • icons/: Various extension images.
    • js/: The JavaScript code.
    • _locales/: Translations of strings shown to the user.
    • pnacl/: Compiled NaCl objects & module (debug & release).
    • release/: A release build of the Chrome extension.
  • unpacker-test/: Code for running NaCl & JavaScript unittests.

NaCl/JS Life Cycle

Some high level points to remember: the JS side reacts to user events and is the only part that has access to actual data on disk. It uses the NaCl module to do all the data parsing (e.g. gzip & tar), but it has to both send a request to the module ("parse this archive"), and respond to requests from the module when the module needs to read actual bytes on disk.

When the extension loads, background.js registers everything and goes idle.

When the Files app wants to mount an archive, callbacks in app.js unpacker.app are called to initialize the NaCl runtime. Creates an unpacker.Volume object for each mounted archive.

Requests on the archive (directory listing, metadata lookups, reading files) are routed through app.js unpacker.app and to volume.js unpacker.Volume. Then they are sent to the low level decompressor.js unpacker.Decompressor which talks to the NaCl module using the request.js unpacker.request protocol. Responses are passed back up.

When the NaCl module is loaded, module.cc NaclArchiveModule is instantiated. That instantiates NaclArchiveInstance for initial JS message entry points. It instantiates JavaScriptMessageSender for sending requests back to JS.

When JS requests come in, module.cc NaclArchiveInstance will create volume.h Volume objects on the fly, and pass requests down to them (using the protocol defined in request.h request::*).

volume.h Volume objects in turn use the volume_archive.h VolumeArchive abstract interface to handle requests from the JS side (using the protocol defined in request.h request:**). This way the lower levels don't have to deal with JS directly.

volume_archive_libarchive.cc VolumeArchiveLibarchive implements the VolumeArchive interface and uses libarchive as its backend to do all the decompression & archive format processing.

But NaCl code doesn't have access to any files or data itself. So the volume_reader.h VolumeReader abstract interface is passed to it to provide the low level data read functions. The volume_reader_javascript_stream.cc VolumeReaderJavaScriptStream implements that by passing requests back up to the JS side via the javascript_requestor_interface.h JavaScriptRequestorInterface interface (which was passed down to it).

So requests (mount an archive, read a file, etc...) generally follow the path:

Then once VolumeArchive has processed the raw data stream, it can return results to the Volume object which takes care of posting JS status messages back to the Chrome side.

Source Layout

Here's the JavaScript code that matters. A few files have very specific purposes and can be ignored at a high level, so they're in a separate section.

  • background.js
    • Main entry point.
    • Initializes the module/runtime.
    • Registers the extension with Chrome filesystem/runtime.
  • app.js unpacker.app
    • Main runtime for the extension.
    • Loads/unloads NaCl modules on demand (to save runtime memory).
    • Loads/unloads volumes as Chrome has requested.
    • Responds to Chrome filesystem callbacks.
    • Passes data back to Chrome from unpacker.Volume objects.
  • volume.js unpacker.Volume
    • Every mounted archive has a unpacker.Volume instance.
    • Provides high level interface to requests (like reading files & metadata).
  • decompressor.js unpacker.Decompressor
    • Provides low level interface for unpacker.Volume requests.
    • Talks to the NaCl module using the unpacker.request protocol.
  • request.js unpacker.request
    • Handle the JS<->NaCl protocol communication.
  • passphrase-manager.js unpacker.PassphraseManager
    • Interface for plumbing password requests between UI & JS & NaCl.

These are the boilerplate/simple JavaScript files you can generally ignore.

Here's the NaCl layout.

Debugging

To see debug messages open chrome from a terminal and check the output. For output redirection see https://developer.chrome.com/native-client/devguide/devcycle/debugging.

Testing

Install Karma for tests runner, Mocha for asynchronous testings, Chai for assertions, and Sinon for spies and stubs.

$ npm install --save-dev \
  karma karma-chrome-launcher karma-cli \
  mocha karma-mocha karma-chai chai karma-sinon sinon

# Run tests:
$ cd unpacker-test
$ ./run_js_tests.sh  # JavaScript tests.
$ ./run_cpp_tests.sh  # C++ tests.

# Check JavaScript code using the Closure JS Compiler.
# See https://www.npmjs.com/package/closurecompiler
$ cd unpacker
$ npm install google-closure-compiler
$ bash check_js_for_errors.sh