This library provides a Pythonic API wrapper for the reference Arrow C++ implementation, along with tools for interoperability with pandas, NumPy, and other traditional Python scientific computing packages.
This project is layered in two pieces:
- pyarrow, a C++ library for easier interoperability between Arrow C++, NumPy, and pandas
- Cython extensions and pure Python code under arrow/ which expose Arrow C++ and pyarrow to pure Python users
These are the various projects that PyArrow depends on.
- g++ and gcc Version >= 4.8
- cmake > 2.8.6
- boost
- Arrow-cpp and its dependencies
The Arrow C++ library must be built with all options enabled and installed with
ARROW_HOME
environment variable set to the installation location. Look at
(https://github.com/apache/arrow/blob/master/cpp/README.md) for instructions.
Ensure PyArrow can locate the Arrow-cpp shared libraries by setting the LD_LIBRARY_PATH environment variable.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ARROW_HOME/lib
- Python dependencies: numpy, pandas, cython, pytest
python setup.py build_ext --inplace
py.test pyarrow
To change the build type, use the --build-type
option or set
$PYARROW_BUILD_TYPE
:
python setup.py build_ext --build-type=release --inplace
To pass through other build options to CMake, set the environment variable
$PYARROW_CMAKE_OPTIONS
.
To build the integration with parquet-cpp, pass --with-parquet
to
the build_ext
option in setup.py:
python setup.py build_ext --with-parquet install
Alternately, add -DPYARROW_BUILD_PARQUET=on
to the general CMake options.
export PYARROW_CMAKE_OPTIONS=-DPYARROW_BUILD_PARQUET=on
pip install -r doc/requirements.txt
python setup.py build_sphinx