Stars
GPU-Based Approximate Nearest Neighbor Search
Sky-T1: Train your own O1 preview model within $450
Navigating Spreading-out Graph For Approximate Nearest Neighbor Search
A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.
A blazing fast inference solution for text embeddings models
ParseableDB is a disk less, cloud native database for observability and security. Parseable is the Observability platform built with ParseableDB
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
The native Rust implementation for Apache Hudi, with Python API bindings.
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…
Developer-friendly, embedded retrieval engine for multimodal AI. Search More; Manage Less.
Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …
Open, Multi-modal Catalog for Data & AI
Benchmarks of approximate nearest neighbor libraries in Python
A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.
Apache DataFusion Comet Spark Accelerator
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Compile protocol buffer messages to TypeScript.
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
The Open-Source toolkit to build your own reliable and secure Industrial IoT platform.
Source code for Twitter's Recommendation Algorithm
An extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.
Integrate cutting-edge LLM technology quickly and easily into your apps