UPDATE: The data generated by these scripts is available on the data server. You do not need to run this yourself.
Scripts to build files for progressive loading of the data from the vesuvius challenge. They produce 3d tiff files.
The rationale for the formats is so that programs can keep in RAM or GPU memory, in any given moment, a low resolution version of a whole scroll for navigation and one or more cubes at full resolution around a focus point. The low should always be loaded, while the cells of the grid are swapped in and out as needed. A grid cell is 238M, so it should take a few seconds to load. Ideally this is done in the background in an interactive program, but even if not, a few seconds is somewhat tolerable.
- y axis: Top down when viewing one of the slice tif images.
- x axis: Left to right when viewing one of the slice tif images.
- z axis: Low number to high number on the tif image slice filenames.
Once loaded into julia it is indexed as vol[iy, ix, iz]
.
A 1/10 in each dimension version of the data, as a single 2GB 3D tif file:
<scan>_small.tif
.
Example path on the server: /full-scrolls/Scroll1.volpkg/volumes_small/20230205180739_small.tif
Splits the scanned volume into cells of 500x500x500 voxels. A 3D tif file is
produced for each one: cell_yxz_YYY_XXX_ZZZ.tif
. The code uses the notation
jy
, jx
, jz
for grid indices. A grid layer is composed of all the cells at
a given jz
.
Lots of cells in the grid don't have relevant data. Less than 50% for scroll_1_54.
To avoid downloading cells without data, we can use a mask specifying those cell
coordinates with data. I've compiled such a mask for scroll_1_54,
in masks/scroll_1_54_mask.csv
.
Example path on the server: /full-scrolls/Scroll1.volpkg/volume_grids/20230205180739/
Julia must be installed, and the packages used (]add FileIO, GeometryBasics, Images, Quaternions, StaticArrays, TiffImages
).
Set the VESUVIUS_SERVER_AUTH
environment variable to the server's user:password
.
Only needed if downloading slices.
vesuvius-build
expects the data from the server in the ./data
subdirectory,
following the same structure for subdirectories. Make a symlink or change DATA_DIR
in data.jl
if you want to point it somewhere else.
Start a julia REPL and run:
julia> include("data.jl"); include("downloads.jl");
julia> scroll_1_54
HerculaneumScan("full-scrolls/Scroll1.volpkg/volumes/20230205180739", 7.91f0, 54.0f0, 8096, 7888, 14376)
julia> scroll_2_54
HerculaneumScan("full-scrolls/Scroll2.volpkg/volumes/20230210143520", 7.91f0, 54.0f0, 11984, 10112, 14428)
download_scan_slices(scan::HerculaneumScan, slices::AbstractArray{Int}; quiet=false)
Download scan slices from the server into DATA_DIR. slices specifies which, and is often
a range.
download_grid_cells_range(scan::HerculaneumScan, jys, jxs, jzs; quiet=false)
Download grid cell files from the data server. Use ranges for jys, jxz and jzs to specify
which. Each grid cell is 238 MB.
download_grid_layer(scan::HerculaneumScan, jz::Int)
Download all scan slices required to build a layer of the grid. Requires
2*scan.width*scan.height*500 B of disk space (60 GB for scroll 1).
julia>include("build_small.jl")
build_small(scan::HerculaneumScan; from=:server)
Builds the downsampled slice files for the "small" dataset. The from argument can be
:server or :disk. Building from disk uses the full resolution slices on your DATA_DIR.
Building from the server loads them from the server into memory, so as not to require as
much disk space.
build_small_volume(scan::HerculaneumScan)
Builds the <SCAN>_small.tif 3D tif file containing the "small" dataset: A low resolution
version of the scroll saved as a 3D tif file.
You must first run build_small to build the files this uses as input.
julia>include("build_grid.jl")
build_grid_layer(scan::HerculaneumScan, jz::Int)
Build a layer of the grid. Requires all slices from that layer on your DATA_DIR. This
takes a long time and requires a lot of memory and disk space.
build_grid(scan::HerculaneumScan)
Builds all the grid files for a scroll. Don't run this, use build_grid_layer to build
only the layers you need, or better yet, download the grid cell files from the data
server.
This takes a long time and requires a lot of memory and disk space (~4TB/scroll). We did
this on the data server so you don't have to. It took about a day to run.
If you are sure that you still want to run it, comment out the @assert. Also, you might
want to tune the sx and sy variables in build_grid_layer to use as much RAM as you can
spend on the job, which will speed it up.