tony88331

tony88331

20 followers · 49 following

Starred repositories

5 stars written in Scala

Clear filter

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

Scala 39,494 28,254 Updated Oct 16, 2024

delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala 7,521 1,688 Updated Oct 15, 2024

snowplow / snowplow

The leader in Next-Generation Customer Data Infrastructure

Scala 6,831 1,189 Updated Sep 2, 2024

awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,284 537 Updated Oct 9, 2024

apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Scala 2,087 910 Updated Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly