Stars
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
Low-code framework for building custom LLMs, neural networks, and other AI models
Sequential Decision Problem Modeling Library @ Castle Lab, Princeton Univ.
An open source ML system for the end-to-end data science lifecycle
A cluster computing framework for processing large-scale geospatial data
The event stream processing platform for developers. Unified experience for real-time data ingestion, stream processing, and low-latency serving. Best-in-class performance and cost-efficiency. Supp…
RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
A composable and fully extensible C++ execution engine library for data management systems.
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Drafted features and patches before contributing to apache flink, with team collaboration
The preview version of a spillable state backend for Apache Flink
A portable accelerated data query and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.
Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.
BACtaki / systemds
Forked from apache/systemdsApache SystemDS - A versatile system for the end-to-end data science lifecycle
Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
BACtaki / tfx-addons
Forked from tensorflow/tfx-addonsTFX-Addons is a collection of community projects to build new components, examples, libraries, and tools for TFX. The projects are organized under the auspices of the special interest group, SIG TF…
metdos / postgresql-hll
Forked from citusdata/postgresql-hllpostgresql-hll - a PostgreSQL extension adding HyperLogLog data structures as a native data type
Code samples for the Effective Data Science Infrastructure book
Java implementation of algorithms from Russell And Norvig's "Artificial Intelligence - A Modern Approach"
刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
Turi Create simplifies the development of custom machine learning models.
Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Nessie: Transactional Catalog for Data Lakes with Git-like semantics