This folder provides base Docker images and an infrastructure to build
manylinux1
compatible Python wheels that should be installable on all
Linux distributions published in last four years.
The process is split up in two parts: There are base Docker images that build
the native, Python-indenpendent dependencies. For these you can select if you
want to also build the dependencies used for the Parquet support. Depending on
these images, there is also a bash script that will build the pyarrow wheels
for all supported Python versions and place them in the dist
folder.
# Create a clean copy of the arrow source tree
git clone ../../ arrow
# Build the native baseimage
docker build -t arrow-base-x86_64 -f Dockerfile-x86_64 .
# Build the python packages
docker run --shm-size=2g --rm -t -i -v $PWD:/io arrow-base-x86_64 /io/build_arrow.sh
# Now the new packages are located in the dist/ folder
ls -l dist/
In addition to the docker images that contains the Arrow C++ and Parquet C++ builds, we also have another base image that only contains their dependencies. This image is less often updated. In the case we want to update a dependency to a new version, we also need to adjust it. You can rebuild this image using
docker build -t arrow_manylinux1_x86_64_base -f Dockerfile-x86_64_base .
For each dependency, we have a bash script in the directory scripts/
that
downloads the sources, builds and installs them. At the end of each dependency
build the sources are removed again so that only the binary installation of a
dependency is persisted in the docker image. When you do local adjustments to
this image, you need to change the FROM
line in Dockerfile-x86_64
to pick up
the new image:
FROM arrow_manylinux1_x86_64_base