Skip to content

Commit

Permalink
ARROW-3834: [Doc] Merge C++ and Python documentation
Browse files Browse the repository at this point in the history
Author: Uwe L. Korn <[email protected]>
Author: Korn, Uwe <[email protected]>
Author: Wes McKinney <[email protected]>

Closes apache#2856 from xhochy/doc-merge and squashes the following commits:

5b687ff <Wes McKinney> Add simple README for the format/ directory
071d16a <Uwe L. Korn> Move format specifications back to /format/
337088b <Uwe L. Korn> Review comments
fbe99c9 <Uwe L. Korn> Add more C++ docs
78a5eaf <Uwe L. Korn> Fix Python docs build
0b4dd33 <Uwe L. Korn> Rename doc to docs
918e762 <Uwe L. Korn> Convert format docs to reST
7aeff65 <Uwe L. Korn> Add doc generation to docker-compose
185cba8 <Uwe L. Korn> Add pre-commit check for RAT
671d244 <Uwe L. Korn> Fix references to format documents
bdd824c <Uwe L. Korn> Move doc to top-level
985d428 <Uwe L. Korn> Move Sphinx doc to top-level directory
f7d5e92 <Uwe L. Korn> Build C++ API docs
7850db8 <Uwe L. Korn> Add breathe as a requirement
d4cf542 <Uwe L. Korn> Fix linter issues
fd75660 <Korn, Uwe> Fix Sphinx build for sphinx>=1.8
9be6fbe <Korn, Uwe> Merge C++ and Python documentation
  • Loading branch information
xhochy authored and wesm committed Dec 6, 2018
1 parent 187b98e commit 35f8a34
Show file tree
Hide file tree
Showing 55 changed files with 1,543 additions and 1,202 deletions.
2 changes: 1 addition & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

.git
docker_cache
docs/_build

# IDE
.vscode
Expand Down Expand Up @@ -49,7 +50,6 @@ python/dist
python/*.egg-info
python/*.egg
python/*.pyc
python/doc/_build
__pycache__/
*/__pycache__/
*/*/__pycache__/
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@
# specific language governing permissions and limitations
# under the License.

apache-rat-*.jar
arrow-src.tar

# Compiled source
*.a
*.dll
Expand All @@ -34,7 +37,9 @@ MANIFEST
*.iml

cpp/.idea/
cpp/apidoc/xml/
python/.eggs/
python/doc/
.vscode
.idea/
.pytest_cache/
Expand Down
8 changes: 8 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,14 @@
# To run all hooks on all files use `pre-commit run -a`

repos:
- repo: local
hooks:
- id: rat
name: rat
language: system
entry: bash -c "git archive HEAD --prefix=apache-arrow/ --output=arrow-src.tar && ./dev/release/run-rat.sh arrow-src.tar"
always_run: true
pass_filenames: false
- repo: git://github.com/pre-commit/pre-commit-hooks
sha: v1.2.3
hooks:
Expand Down
1 change: 1 addition & 0 deletions ci/conda_env_python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@ numpy
pandas
pytest
python
rsync
setuptools
setuptools_scm
23 changes: 23 additions & 0 deletions ci/conda_env_sphinx.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Requirements for building the documentation
breathe
doxygen
ipython
sphinx
sphinx_rtd_theme
30 changes: 30 additions & 0 deletions ci/docker_build_sphinx.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

set -ex

pushd /arrow/cpp/apidoc
doxygen
popd

pushd /arrow/python
python setup.py build_sphinx -s ../docs/source --build-dir ../docs/_build
popd

mkdir -p /arrow/site/asf-site/docs/latest
rsync -r /arrow/docs/_build/html/ /arrow/site/asf-site/docs/latest/
11 changes: 5 additions & 6 deletions ci/travis_script_python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,7 @@ conda install -y -q pip \

if [ "$ARROW_TRAVIS_PYTHON_DOCS" == "1" ] && [ "$PYTHON_VERSION" == "3.6" ]; then
# Install documentation dependencies
conda install -y -q \
ipython \
numpydoc \
sphinx=1.7.9 \
sphinx_rtd_theme
conda install -y -c conda-forge --file ci/conda_env_sphinx.yml
fi

# ARROW-2093: PyTorch increases the size of our conda dependency stack
Expand Down Expand Up @@ -190,7 +186,10 @@ if [ "$ARROW_TRAVIS_COVERAGE" == "1" ]; then
fi

if [ "$ARROW_TRAVIS_PYTHON_DOCS" == "1" ] && [ "$PYTHON_VERSION" == "3.6" ]; then
cd doc
pushd ../cpp/apidoc
doxygen
popd
cd ../docs
sphinx-build -q -b html -d _build/doctrees -W source _build/html
fi

Expand Down
2 changes: 1 addition & 1 deletion cpp/apidoc/Doxyfile
Original file line number Diff line number Diff line change
Expand Up @@ -1919,7 +1919,7 @@ MAN_LINKS = NO
# captures the structure of the code including all documentation.
# The default value is: NO.

GENERATE_XML = NO
GENERATE_XML = YES

# The XML_OUTPUT tag is used to specify where the XML pages will be put. If a
# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of
Expand Down
57 changes: 0 additions & 57 deletions cpp/apidoc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,60 +41,3 @@ Table of Contents
* [Convert a vector of row-wise data into an Arrow table](tutorials/row_wise_conversion.md)
* [Using the Plasma In-Memory Object Store](tutorials/plasma.md)
* [Use Plasma to Access Tensors from C++ in Python](tutorials/tensor_to_py.md)

Getting Started
---------------

The most basic structure in Arrow is an `arrow::Array`. It holds a sequence
of values with known length all having the same type. It consists of the data
itself and an additional bitmap that indicates if the corresponding entry of
array is a null-value. Note that for array with zero null entries, we can omit
this bitmap.

As Arrow objects are immutable, there are classes provided that should help you
build these objects. To build an array of `int64_t` elements, we can use the
`arrow::Int64Builder`. In the following example, we build an array of the range
1 to 8 where the element that should hold the number 4 is nulled.

Int64Builder builder;
builder.Append(1);
builder.Append(2);
builder.Append(3);
builder.AppendNull();
builder.Append(5);
builder.Append(6);
builder.Append(7);
builder.Append(8);

std::shared_ptr<Array> array;
builder.Finish(&array);

The resulting Array (which can be casted to `arrow::Int64Array` if you want
to access its values) then consists of two `arrow::Buffer`. The first one is
the null bitmap holding a single byte with the bits `0|0|0|0|1|0|0|0`.
As we use [least-significant bit (LSB) numbering](https://en.wikipedia.org/wiki/Bit_numbering)
this indicates that the fourth entry in the array is null. The second
buffer is simply an `int64_t` array containing all the above values.
As the fourth entry is null, the value at that position in the buffer is
undefined.

// Cast the Array to its actual type to access its data
std::shared_ptr<Int64Array> int64_array = std::static_pointer_cast<Int64Array>(array);

// Get the pointer to the null bitmap.
const uint8_t* null_bitmap = int64_array->null_bitmap_data();

// Get the pointer to the actual data
const int64_t* data = int64_array->raw_values();

In the above example, we have yet skipped explaining two things in the code.
On constructing the builder, we have passed `arrow::int64()` to it. This is
the type information with which the resulting array will be annotated. In
this simple form, it is solely a `std::shared_ptr<arrow::Int64Type>`
instantiation.

Furthermore, we have passed `arrow::default_memory_pool()` to the constructor.
This `arrow::MemoryPool` is used for the allocations of heap memory. Besides
tracking the amount of memory allocated, the allocator also ensures that the
allocated memory regions are 64-byte aligned (as required by the Arrow
specification).
1 change: 1 addition & 0 deletions cpp/src/arrow/array.h
Original file line number Diff line number Diff line change
Expand Up @@ -397,6 +397,7 @@ class ARROW_EXPORT PrimitiveArray : public FlatArray {
const uint8_t* raw_values_;
};

/// Concrete Array class for numeric data.
template <typename TYPE>
class ARROW_EXPORT NumericArray : public PrimitiveArray {
public:
Expand Down
9 changes: 0 additions & 9 deletions dev/gen_apidocs/create_documents.sh
Original file line number Diff line number Diff line change
Expand Up @@ -87,15 +87,6 @@ rsync -r doc/parquet-glib/html/ ../../site/asf-site/docs/c_glib/parquet-glib
popd
popd

# Now Python documentation can be built
pushd arrow/python
python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \
--with-plasma --with-parquet --inplace
python setup.py build_sphinx -s doc/source
mkdir -p ../site/asf-site/docs/python
rsync -r doc/_build/html/ ../site/asf-site/docs/python
popd

# Make C++ documentation
pushd arrow/cpp/apidoc
rm -rf html/*
Expand Down
2 changes: 1 addition & 1 deletion dev/release/rat_exclude_files.txt
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ dev/tasks/linux-packages/debian/plasma-store-server.install
dev/tasks/linux-packages/debian/rules
dev/tasks/linux-packages/debian/source/format
dev/tasks/linux-packages/debian/watch
docs/requirements.txt
go/arrow/go.sum
go/arrow/Gopkg.lock
go/arrow/internal/cpu/*
Expand All @@ -124,7 +125,6 @@ js/.npmignore
js/closure-compiler-scripts/*
python/cmake_modules
python/cmake_modules/*
python/doc/requirements.txt
python/MANIFEST.in
python/pyarrow/includes/__init__.pxd
python/pyarrow/tests/__init__.py
Expand Down
8 changes: 6 additions & 2 deletions dev/release/run-rat.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,14 @@
# under the License.
#

RAT_VERSION=0.12

# download apache rat
curl -s https://repo1.maven.org/maven2/org/apache/rat/apache-rat/0.12/apache-rat-0.12.jar > apache-rat-0.12.jar
if [ ! -f apache-rat-${RAT_VERSION}.jar ]; then
curl -s https://repo1.maven.org/maven2/org/apache/rat/apache-rat/${RAT_VERSION}/apache-rat-${RAT_VERSION}.jar > apache-rat-${RAT_VERSION}.jar
fi

RAT="java -jar apache-rat-0.12.jar -x "
RAT="java -jar apache-rat-${RAT_VERSION}.jar -x "

RELEASE_DIR=$(cd "$(dirname "$BASH_SOURCE")"; pwd)

Expand Down
16 changes: 15 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ services:
######################### Tools and Linters #################################

# TODO(kszucs): site
# TODO(kszucs): apidoc
# TODO(kszucs): {cpp,java,glib,js}-apidoc

lint:
# Usage:
Expand All @@ -178,12 +178,26 @@ services:

clang-format:
# Usage:
# docker-compose build cpp
# docker-compose build python
# docker-compose build lint
# docker-compose run clang-format
image: arrow:lint
command: arrow/dev/lint/run_clang_format.sh
volumes: *ubuntu-volumes

docs:
# Usage:
# docker-compose build cpp
# docker-compose build python
# docker-compose build docs
# docker-compose run docs
image: arrow:docs
build:
context: .
dockerfile: docs/Dockerfile
volumes: *volumes

######################### Integration Tests #################################

# impala:
Expand Down
2 changes: 1 addition & 1 deletion python/doc/.gitignore → docs/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@
# under the License.

_build
source/generated
source/python/generated
File renamed without changes.
26 changes: 26 additions & 0 deletions docs/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

FROM arrow:python-3.6

ADD ci/conda_env_sphinx.yml /arrow/ci/
RUN conda install -c conda-forge \
--file arrow/ci/conda_env_sphinx.yml && \
conda clean --all
CMD arrow/ci/docker_build_cpp.sh && \
arrow/ci/docker_build_python.sh && \
arrow/ci/docker_build_sphinx.sh
File renamed without changes.
File renamed without changes.
1 change: 1 addition & 0 deletions python/doc/requirements.txt → docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
breathe
ipython
matplotlib
numpydoc
Expand Down
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 35f8a34

Please sign in to comment.