Skip to content
View BACtaki's full-sized avatar

Block or report BACtaki

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.

Go 3,750 142 Updated Feb 13, 2025

Low-code framework for building custom LLMs, neural networks, and other AI models

Python 11,319 1,201 Updated Feb 3, 2025

Sequential Decision Problem Modeling Library @ Castle Lab, Princeton Univ.

Python 316 145 Updated Feb 2, 2019

An open source ML system for the end-to-end data science lifecycle

Java 1,040 481 Updated Feb 9, 2025

A cluster computing framework for processing large-scale geospatial data

Java 1,995 693 Updated Feb 13, 2025

The event stream processing platform for developers. Unified experience for real-time data ingestion, stream processing, and low-latency serving. Best-in-class performance and cost-efficiency. Supp…

Rust 7,358 605 Updated Feb 13, 2025
TeX 21 6 Updated Jan 15, 2025

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.

Python 324 74 Updated Dec 26, 2024

A composable and fully extensible C++ execution engine library for data management systems.

C++ 3,612 1,204 Updated Feb 12, 2025

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Scala 1,261 461 Updated Feb 12, 2025

Drafted features and patches before contributing to apache flink, with team collaboration

Java 4 Updated Sep 1, 2022

The preview version of a spillable state backend for Apache Flink

Java 39 10 Updated Mar 14, 2021

A portable accelerated data query and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.

Rust 2,048 88 Updated Feb 13, 2025

Kite SDK

Java 393 262 Updated Nov 1, 2022

Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.

Jupyter Notebook 665 58 Updated May 15, 2024

Apache SystemDS - A versatile system for the end-to-end data science lifecycle

Java 1 Updated Jan 17, 2024

Apache Iceberg

Java 6,883 2,367 Updated Feb 11, 2025

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.

Rust 16,108 490 Updated Feb 13, 2025

TFX-Addons is a collection of community projects to build new components, examples, libraries, and tools for TFX. The projects are organized under the auspices of the special interest group, SIG TF…

Jupyter Notebook 1 Updated May 10, 2022

postgresql-hll - a PostgreSQL extension adding HyperLogLog data structures as a native data type

C 1 Updated Mar 30, 2013
HTML 1 Updated May 19, 2021

Code samples for the Effective Data Science Infrastructure book

Python 113 30 Updated Jun 2, 2023

Java implementation of algorithms from Russell And Norvig's "Artificial Intelligence - A Modern Approach"

Java 1,573 801 Updated Dec 20, 2023

刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.

Markdown 126,787 23,295 Updated Jan 31, 2025

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Scala 2,140 930 Updated Feb 11, 2025

Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.

Go 3,092 361 Updated Feb 12, 2025

Turi Create simplifies the development of custom machine learning models.

C++ 11,198 1,143 Updated Nov 1, 2023

Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)

HTML 8,027 1,572 Updated Feb 5, 2025

Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.

Java 3,590 802 Updated Jun 7, 2024

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

Java 1,125 142 Updated Feb 12, 2025
Next