Skip to content

Latest commit

 

History

History
 
 

Learn Druid

The "Learn Druid" repository contains all manner of resources to help you learn and apply Apache Druid®.

It contains:

  • Jupyter Notebooks that guide you through query, ingestion, and data management with Apache Druid.
  • A Docker Compose file to get you up and running with a learning lab.

Suggestions or comments? Call into the discussions. Found a problem or want to request a notebook? Raise an issue. Want to contribute? Raise a PR.

Contributions to this community resource are welcome! Contribute your own notebook on a topic that's not listed here, and check out the issue list, where you'll find bugs and enhancement requests.

Come meet your friendly Apache Druid community if you have any questions about the functionality you see here.

Pre-requisites

To use the "Learn Druid" Docker Compose, you need:

  • Git or Github Desktop

  • Docker Desktop with Docker Compose

  • A machine with at least 6 GiB of RAM.

    Of course, more power is better. The notebooks have been tested with the following resources available to docker: 6 CPUs, 8GB of RAM, and 1 GB swap.

Quickstart

To get started quickly:

  1. Clone the repository:

    git clone https://github.com/implydata/learn-druid
  2. Navigate to the directory:

     cd learn-druid
  3. Launch the environment:

    docker compose --profile druid-jupyter up -d

    The first time you launch the environment, it can take a while to start all the services.

  4. Navigate to Jupyter Lab in your browser at http://localhost:8889/lab.
    From there you can read the introduction or use Jupyter Lab to navigate the notebooks folder.

  5. When you're finished, stop all services:

docker compose --profile druid-jupyter down

Once you have cloned the repository, get the latest version as follows:

git restore .
git pull

While using the notebooks, monitor ingestion tasks, compare query results, and more in the web console directly at http://localhost:8888.

Profiles

Individual notebooks may state a specific compose profile that you need to use.

Specify the profile after the --profile parameter to the docker compose command. For example, to start with the all-services profile, use this command:

docker compose --profile all-services up -d

To stop all services:

docker compose --profile all-services down

To stop all services without keeping any data:

docker compose --profile all-services down -v

Run the notebooks against an existing Apache Druid database using the DRUID_HOST parameter and the jupyter profile.

DRUID_HOST=[host address] docker compose --profile jupyter up -d

When you have Druid running on the local machine, use host.docker.internal as the host address.

DRUID_HOST=host.docker.internal docker compose --profile jupyter up -d

Components

The Learn Druid environment includes the following services:

Jupyter Lab: An interactive environment to run Jupyter Notebooks. The image for Jupyter used in the environment contains Python along with all the supporting libraries you need to run the notebooks.

Apache Kafka: Streaming service as a data source for Druid.

Imply Data Generator: A tool to generate sample data for Druid. It can produce either batch or streaming data.

Apache Druid: The currently released version of Apache Druid by default.


This repository is not affiliated with, endorsed by, or otherwise associated with the Apache Software Foundation (ASF) or any of its projects. Apache, Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of ASF in the USA and other countries.