Skip to content

Build files for progressive loading of the vesuvius challenge data.

License

Notifications You must be signed in to change notification settings

spelufo/vesuvius-build

Repository files navigation

UPDATE: The data generated by these scripts is available on the data server. You do not need to run this yourself.

vesuvius-build

Scripts to build files for progressive loading of the data from the vesuvius challenge. They produce 3d tiff files.

The rationale for the formats is so that programs can keep in RAM or GPU memory, in any given moment, a low resolution version of a whole scroll for navigation and one or more cubes at full resolution around a focus point. The low should always be loaded, while the cells of the grid are swapped in and out as needed. A grid cell is 238M, so it should take a few seconds to load. Ideally this is done in the background in an interactive program, but even if not, a few seconds is somewhat tolerable.

The coordinate system

  • y axis: Top down when viewing one of the slice tif images.
  • x axis: Left to right when viewing one of the slice tif images.
  • z axis: Low number to high number on the tif image slice filenames.

Once loaded into julia it is indexed as vol[iy, ix, iz].

The "small" dataset

A 1/10 in each dimension version of the data, as a single 2GB 3D tif file: <scan>_small.tif.

Example path on the server: /full-scrolls/Scroll1.volpkg/volumes_small/20230205180739_small.tif

The grid dataset

Splits the scanned volume into cells of 500x500x500 voxels. A 3D tif file is produced for each one: cell_yxz_YYY_XXX_ZZZ.tif. The code uses the notation jy, jx, jz for grid indices. A grid layer is composed of all the cells at a given jz.

Lots of cells in the grid don't have relevant data. Less than 50% for scroll_1_54. To avoid downloading cells without data, we can use a mask specifying those cell coordinates with data. I've compiled such a mask for scroll_1_54, in masks/scroll_1_54_mask.csv.

Example path on the server: /full-scrolls/Scroll1.volpkg/volume_grids/20230205180739/

Usage

Julia must be installed, and the packages used (]add FileIO, GeometryBasics, Images, Quaternions, StaticArrays, TiffImages).

Set the VESUVIUS_SERVER_AUTH environment variable to the server's user:password. Only needed if downloading slices.

vesuvius-build expects the data from the server in the ./data subdirectory, following the same structure for subdirectories. Make a symlink or change DATA_DIR in data.jl if you want to point it somewhere else.

Start a julia REPL and run:

julia> include("data.jl"); include("downloads.jl");

julia> scroll_1_54
HerculaneumScan("full-scrolls/Scroll1.volpkg/volumes/20230205180739", 7.91f0, 54.0f0, 8096, 7888, 14376)

julia> scroll_2_54
HerculaneumScan("full-scrolls/Scroll2.volpkg/volumes/20230210143520", 7.91f0, 54.0f0, 11984, 10112, 14428)

Downloading data

  download_scan_slices(scan::HerculaneumScan, slices::AbstractArray{Int}; quiet=false)

  Download scan slices from the server into DATA_DIR. slices specifies which, and is often
  a range.
  download_grid_cells_range(scan::HerculaneumScan, jys, jxs, jzs; quiet=false)

  Download grid cell files from the data server. Use ranges for jys, jxz and jzs to specify
  which. Each grid cell is 238 MB.
  download_grid_layer(scan::HerculaneumScan, jz::Int)

  Download all scan slices required to build a layer of the grid. Requires
  2*scan.width*scan.height*500 B of disk space (60 GB for scroll 1).

Building the "small" dataset

julia>include("build_small.jl")
  build_small(scan::HerculaneumScan; from=:server)

  Builds the downsampled slice files for the "small" dataset. The from argument can be
  :server or :disk. Building from disk uses the full resolution slices on your DATA_DIR.
  Building from the server loads them from the server into memory, so as not to require as
  much disk space.
  build_small_volume(scan::HerculaneumScan)

  Builds the <SCAN>_small.tif 3D tif file containing the "small" dataset: A low resolution
  version of the scroll saved as a 3D tif file.

  You must first run build_small to build the files this uses as input.

Building the grid dataset

julia>include("build_grid.jl")
  build_grid_layer(scan::HerculaneumScan, jz::Int)

  Build a layer of the grid. Requires all slices from that layer on your DATA_DIR. This
  takes a long time and requires a lot of memory and disk space.
  build_grid(scan::HerculaneumScan)

  Builds all the grid files for a scroll. Don't run this, use build_grid_layer to build
  only the layers you need, or better yet, download the grid cell files from the data
  server.

  This takes a long time and requires a lot of memory and disk space (~4TB/scroll). We did
  this on the data server so you don't have to. It took about a day to run.

  If you are sure that you still want to run it, comment out the @assert. Also, you might
  want to tune the sx and sy variables in build_grid_layer to use as much RAM as you can
  spend on the job, which will speed it up.

About

Build files for progressive loading of the vesuvius challenge data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages