#
Starred repositories
5
stars
written in Scala
Clear filter
Apache Spark - A unified analytics engine for large-scale data processing
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
The leader in Next-Generation Customer Data Infrastructure
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.