Skip to content

Apache Polaris, the interoperable, open source catalog for Apache Iceberg

License

Notifications You must be signed in to change notification settings

sfc-gh-dzhang/polaris-sf-dzhang

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Polaris Catalog

Polaris Catalog is an open source catalog for Apache Iceberg. Polaris Catalog implements Iceberg’s open REST API for multi-engine interoperability with Apache Doris, Apache Flink, Apache Spark, PyIceberg, StarRocks and Trino.

1200x500_DCS24_PR-Banner-Polaris Catalog-02@2x

Status

Polaris Catalog is open source under an Apache 2.0 license.

  • ⭐ Star this repo if you’d like to bookmark and come back to it!
  • 📖 Read the announcement blog post for more details!

API Docs

API docs are hosted via Github Pages at https://polaris-catalog.github.io/polaris. All updates to the main branch update the hosted docs.

The Polaris management API docs are found here

The open source Iceberg REST API docs are at index.html

Docs are generated using Redocly. They can be regenerated by running the following commands from the project root directory

docker run -p 8080:80 -v ${PWD}:/spec redocly/cli build-docs spec/polaris-management-service.yml --output=docs/polaris-management/index.html
docker run -p 8080:80 -v ${PWD}:/spec redocly/cli build-docs spec/rest-catalog-open-api.yaml --output=docs/iceberg-rest/index.html

Setup

Requirements / Setup

  • Java JDK >= 21 . If on a Mac you can use jenv to set the appropriate SDK.
  • Gradle 8.6 - This is included in the project and can be run using ./gradlew in the project root.
  • Docker - If you want to run the project in a containerized environment.

Command-Line getting started

Polaris is a multi-module project with three modules:

  • polaris-core - The main Polaris entity definitions and core business logic
  • polaris-server - The Polaris REST API server
  • polaris-eclipselink - The Eclipselink implementation of the MetaStoreManager interface

Build the binary (first time may require installing new JDK version). This build will run IntegrationTests by default.

./gradlew build

Run the Polaris server locally on localhost:8181

./gradlew runApp

While the Polaris server is running, run regression tests, or end-to-end tests in another terminal

./regtests/run.sh

Docker Instructions

Build the image:

docker build -t localhost:5001/polaris:latest .

Run it in a standalone mode. This runs a single container that binds the container's port 8181 to localhosts 8181:

docker run -p 8181:8181 localhost:5001/polaris:latest

Running the tests

Unit and Integration tests

Unit and integration tests are run using gradle. To run all tests, use the following command:

./gradlew test

Regression tests

Regression tests, or functional tests, are stored in the regtests directory. They can be executed in a docker environment by using the docker-compose.yml file in the project root.

docker compose up --build --exit-code-from regtest

They can also be executed outside of docker by following the setup instructions in the README

Kubernetes Instructions


You can run Polaris as a mini-deployment locally. This will create two pods that bind themselves to port 8181:

./setup.sh

You can check the pod and deployment status like so:

kubectl get pods
kubectl get deployment

If things aren't working as expected you can troubleshoot like so:

kubectl describe deployment polaris-deployment

Creating a Catalog manually

Before connecting with Spark, you'll need to create a catalog. To create a catalog, generate a token for the root principal:

curl -i -X POST \
  http://localhost:8181/api/catalog/v1/oauth/tokens \
  -d 'grant_type=client_credentials&client_id=<principalClientId>=&client_secret=<mainSecret>=&scope=PRINCIPAL_ROLE:ALL'

The response output will contain an access token:

{
  "access_token": "ver:1-hint:1036-ETMsDgAAAY/GPANareallyverylongstringthatissecret",
  "token_type": "bearer",
  "expires_in": 3600
}

Set the contents of the access_token field as the PRINCIPAL_TOKEN variable. Then use curl to invoke the createCatalog api:

$ export PRINCIPAL_TOKEN=ver:1-hint:1036-ETMsDgAAAY/GPANareallyverylongstringthatissecret

$ curl -i -X PUT -H "Authorization: Bearer $PRINCIPAL_TOKEN" -H 'Accept: application/json' -H 'Content-Type: application/json' \
  http://${POLARIS_HOST:-localhost}:8181/api/v1/catalogs \
  -d '{"name": "snowflake", "id": 100, "type": "INTERNAL", "readOnly": false}'

This creates a catalog called snowflake. From here, you can use Spark to create namespaces, tables, etc.

You must run the following as the first query in your spark-sql shell to actually use Polaris:

use polaris;

About

Apache Polaris, the interoperable, open source catalog for Apache Iceberg

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 48.7%
  • Java 46.5%
  • Shell 1.5%
  • Kotlin 0.9%
  • HTML 0.9%
  • Jupyter Notebook 0.7%
  • Other 0.8%