MADlib® is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.
See the project website MADlib Home
for links to the
latest binary and source packages. For installation and contribution guides,
and other useful information
please refer to the MADlib Wiki
We provide a Docker image with necessary dependencies required to compile and test MADlib on PostgreSQL 9.6. You can view the dependency Docker file at ./tool/docker/base/Dockerfile_postgres_9_6. The image is hosted on Docker Hub at madlib/postgres_9.6:latest. Later we will provide a similar Docker image for Greenplum Database.
Some useful commands to use the docker file:
## 1) Pull down the `madlib/postgres_9.6:latest` image from docker hub:
docker pull madlib/postgres_9.6:latest
## 2) Launch a container corresponding to the MADlib image, mounting the
## source code folder to the container:
docker run -d -it --name madlib \
-v (path to incubator-madlib directory):/incubator-madlib/ madlib/postgres_9.6
# where incubator-madlib is the directory where the MADlib source code resides.
################################# * WARNING * #################################
# Please be aware that when mounting a volume as shown above, any changes you
# make in the "incubator-madlib" folder inside the Docker container will be
# reflected on your local disk (and vice versa). This means that deleting data
# in the mounted volume from a Docker container will delete the data from your
# local disk also.
###############################################################################
## 3) When the container is up, connect to it and build MADlib:
docker exec -it madlib bash
mkdir /incubator-madlib/build-docker
cd /incubator-madlib/build-docker
cmake ..
make
make doc
make install
## 4) Install MADlib:
src/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres install
## 5) Several other commands can now be run, such as:
# Run install check, on all modules:
src/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres install-check
# Run install check, on a specific module, say svm:
src/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres install-check -t svm
# Reinstall MADlib:
src/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres reinstall
## 6) Kill and remove containers (after exiting the container):
docker kill madlib
docker rm madlib
Detailed build instructions are available in ReadMe_Build.txt
The latest documentation of MADlib modules can be found at MADlib Docs
.
The following block-diagram gives a high-level overview of MADlib's architecture.
MADlib incorporates software from the following third-party components. Bundled with source code:
libstemmer
"small string processing language"m_widen_init
"allows compilation with recent versions of gcc with runtime dependencies from earlier versions of libstdc++"argparse 1.2.1
"provides an easy, declarative interface for creating command line tools"PyYAML 3.10
"YAML parser and emitter for Python"UseLATEX.cmake
"CMAKE commands to use the LaTeX compiler"
Downloaded at build time (or supplied as build dependencies):
Boost 1.61.0 (or newer)
"provides peer-reviewed portable C++ source libraries"PyXB 1.2.4
"Python library for XML Schema Bindings"Eigen 3.2.2
"C++ template library for linear algebra"
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE
file distributed with this work for additional information regarding copyright ownership. The ASF licenses this project to You under the Apache License, Version 2.0 (the "License"); you may not use this project except in compliance with the License. You may obtain a copy of the License at LICENSE
.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
As specified in LICENSE
additional license information regarding included third-party libraries can be
found inside the licenses
directory.
Changes between MADlib versions are described in the
ReleaseNotes.txt
file.