Skip to content

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

License

Notifications You must be signed in to change notification settings

br0nstein/nessie

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Nessie

Project Nessie is a Transactional Catalog for Data Lakes with Git-like semantics.

Zulip Group Discussion Twitter Website

Maven Central PyPI quay.io Docker Artifact Hub Swagger Hub

Build Status Query Engines CI Java 17+18 Windows Build macOS Build

More information can be found at projectnessie.org.

Nessie supports Iceberg Tables/Views and Delta Lake Tables. Additionally, Nessie is focused on working with the widest range of tools possible, which can be seen in the feature matrix.

Using Nessie

You can quickly get started with Nessie by using our small, fast docker image.

IMPORTANT NOTE Nessie moves away from docker.io to GitHub's container registry ghcr.io, and also quay.io. Previous releases are already available on both ghcr.io and quay.io. Please update references to projectnessie/nessie in your code to either ghcr.io/projectnessie/nessie or quay.io/projectnessie/nessie.

docker pull ghcr.io/projectnessie/nessie
docker run -p 19120:19120 ghcr.io/projectnessie/nessie

For trying Nessie image with different configuration options, refer to the templates under the docker module.

A local Web UI will be available at this point.

Then install the Nessie CLI tool (to learn more about CLI tool and how to use it, check Nessie CLI Documentation).

pip install pynessie

From there, you can use one of our technology integrations such those for

To learn more about all supported integrations and tools, check here

Have fun! We have a Google Group and a Slack channel we use for both developers and users. Check them out here.

Authentication

By default, Nessie servers run with authentication disabled and all requests are processed under the "anonymous" user identity.

Nessie supports bearer tokens and uses OpenID Connect for validating them.

Authentication can be enabled by setting the following Quarkus properties:

  • nessie.server.authentication.enabled=true
  • quarkus.oidc.auth-server-url=<OpenID Server URL>
  • quarkus.oidc.client-id=<Client ID>

Experimenting with Nessie Authentication in Docker

One can start the projectnessie/nessie docker image in authenticated mode by setting the properties mentioned above via docker environment variables. For example:

docker run -p 19120:19120 \
  -e QUARKUS_OIDC_CLIENT_ID=<Client ID> \
  -e QUARKUS_OIDC_AUTH_SERVER_URL=<OpenID Server URL> \
  -e NESSIE_SERVER_AUTHENTICATION_ENABLED=true \
  --network host \
  ghcr.io/projectnessie/nessie

Building and Developing Nessie

Requirements

  • JDK 11 or higher: JDK11 or higher is needed to build Nessie (artifacts are built for Java 8)

Installation

Clone this repository:

git clone https://github.com/projectnessie/nessie
cd nessie

Then open the project in IntelliJ or Eclipse, or just use the IDEs to clone this github repository.

Refer to CONTRIBUTING for build instructions.

Compatibility

Nessie Iceberg's integration is compatible with Iceberg as in the following table:

Nessie version Iceberg version Spark version Hive version Flink version Presto version Trino version
0.60.1 1.3.0 3.1.x (Scala 2.12), 3.2.x (Scala 2.12+2.13), 3.3.x (Scala 2.12+2.13), 3.4.x (Scala 2.12+2.13) n/a 1.15.x, 1.16.x, 1.17.x 0.277, 0.278.x, 0.279, 0.280, 0.281 419

Nessie Delta Lake's integration is compatible with Delta Lake as in the following table:

Nessie version Delta Lake version Spark version
0.60.1 Custom 3.2.X

Delta Lake artifacts

Nessie required some minor changes to Delta for full support of branching and history. These changes are currently being integrated into the mainline repo. Until these have been merged we have provided custom builds in our fork which can be downloaded from a separate maven repository.

Distribution

To run:

  1. configuration in servers/quarkus-server/src/main/resources/application.properties
  2. execute ./gradlew quarkusDev
  3. go to http://localhost:19120

UI

To run the ui (from ui directory):

  1. If you are running in test ensure that setupProxy.js points to the correct api instance. This ensures we avoid CORS issues in testing
  2. npm install will install dependencies
  3. npm run start to start the ui in development mode via node

To deploy the ui (from ui directory):

  1. npm install will install dependencies
  2. npm build will minify and collect the package for deployment in build
  3. the build directory can be deployed to any static hosting environment or run locally as serve -s build

Docker image

Official Nessie images are built with support for multiplatform builds. But to quickly build a docker image for testing purposes, simply run the following command:

./gradlew :nessie-quarkus:clean :nessie-quarkus:quarkusBuild
docker build -f ./tools/dockerbuild/docker/Dockerfile-jvm -t nessie-unstable:latest ./servers/quarkus-server 

Check that your image is available locally:

docker images

You should see something like this:

REPOSITORY       TAG     IMAGE ID       CREATED          SIZE
nessie-unstable  latest  24bb4c7bd696   15 seconds ago   555MB

Once this is done you can run your image with docker run -p 19120:19120 quay.io/nessie-unstable:latest, passing the relevant environment variables, if any. Environment variables names must follow MicroProfile Config's mapping rules.

AWS Lambda

You can also deploy to AWS lambda function by following the steps in servers/lambda/README.md

Nessie related repositories

Contributing

Code Style

The Nessie project uses the Google Java Code Style, scalafmt and pep8. See CONTRIBUTING.md for more information.

Acknowledgements

See ACKNOWLEDGEMENTS.md

About

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 87.7%
  • Kotlin 3.7%
  • Scala 3.2%
  • Python 2.8%
  • TypeScript 1.1%
  • JavaScript 1.0%
  • Other 0.5%