Stars
Apache Polaris, the interoperable, open source catalog for Apache Iceberg
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
cloudera / dbt-spark-livy
Forked from dbt-labs/dbt-sparkThe dbt-spark-livy adapter allows you to use dbt along with Apache Spark, by connecting via Apache Livy
The event stream processing platform for developers. Unified experience for real-time data ingestion, stream processing, and low-latency serving. Best-in-class performance and cost-efficiency. Supp…
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …
Uniffle is a high performance, general purpose Remote Shuffle Service.
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
Java utilities for transforming distance along N-dimensional Hilbert Curve to a point and back. Also supports range splitting queries on the Hilbert Curve.
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
TPC-H queries in Apache Spark SQL using native DataFrames API
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
NVIDIA / spark-xgboost
Forked from dmlc/xgboostScalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
一站式云原生实时流数据平台,通过0侵入、插件化构建企业级Kafka服务,极大降低操作、存储和管理实时流数据门槛
It is open source ebook about TensorFlow kernel and implementation mechanism.
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
An Integrated and collaborative cloud environment for building and running Spark applications on PKS/Kubernetes
Production-Grade Container Scheduling and Management