Fugue is a pure abstraction layer that adapts to different computing frameworks such as Spark and Dask. It is to unify the core concepts of distributed computing and to help you decouple your logic from specific computing frameworks.
pip install fugue
Fugue has these extras:
- sql: to support Fugue SQL
- spark: to support Spark as the ExecutionEngine
- dask: to support Dask as the ExecutionEngine
For example a common use case is:
pip install fugue[sql,spark]
To read the complete static docs, click here
The best way to start is to go through the tutorials. We have the tutorials in an interactive notebook environent.
But it runs slow on binder, the machine on binder isn't powerful enough for a distributed framework such as Spark. Parallel executions can become sequential, so some of the performance comparison examples will not give you the correct numbers.
Alternatively, you should get decent performance if running its docker image on your own machine:
docker run -p 8888:8888 fugueproject/tutorials:latest
There are three steps to setting-up a development environment
- Create a virtual environment with your choice of environment manager
- Install the requirements
- Install the git hook scripts
Below are examples for how to create and activate an environment in virtualenv and conda.
Using virtualenv
python3 -m venv venv
. venv/bin/activate
Using conda
conda create --name fugue-dev
conda activate fugue-dev
The Fugue repo has a Makefile that can be used to install the requirements. It supports installation in both pip and conda.
Pip install requirements
make setupinpip
Conda install requirements
make setupinconda
Manually install requirements
For Windows users who don't have the make
command, you can use your package manager of choice. For pip:
pip3 install -r requirements.txt
For Anaconda users, first install pip in the newly created environment. If pip install is used without installing pip, conda will use the system-wide pip
conda install pip
pip install -r requirements.txt
Notes for Windows Users
For Windows users, you will need to download Microsoft C++ Build Tools found here
Fugue has pre-commit hooks to check if code is appropriate to be commited. The previous make
command installs this.
If you installed the requirements manually, install the git hook scripts with:
pre-commit install
- Added set operations to programming interface:
union
,subtract
,intersect
- Added
distinct
to programming interface - Ensured partitioning follows SQL convention: groups with null keys are NOT removed
- Switched
join
,union
,subtract
,intersect
,distinct
to QPD implementations, so they follow SQL convention - Set operations in Fugue SQL can directly operate on Fugue statemens (e.g.
TRANSFORM USING t1 UNION TRANSFORM USING t2
) - Fixed bugs
- Added onboarding document for contributors
- Main features of Fugue core and Fugue SQL
- Support backends: Pandas, Spark and Dask