Stars
Large World Model -- Modeling Text and Video with Millions Context
A Bulletproof Way to Generate Structured JSON from Language Models
A hyper-fast local vector database for use with LLM Agents. Now accepting SAFEs at $135M cap.
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
DSPy: The framework for programming—not prompting—language models
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Sample base images for Databricks Container Services
An open protocol for secure data sharing
Offload IoT computation to local hardware while justifying any network accesses.
A native Rust library for Delta Lake, with bindings into Python
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
The library for web and native user interfaces.
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
The Tensor Algebra SuperOptimizer for Deep Learning
An open-source toolkit for large-scale genomic analysis
Puffer is a free live TV streaming website and a research study at Stanford using machine learning to improve video streaming
A Python-embedded modeling language for convex optimization problems.
GoCD plugins to work with MLFlow as model repository in a CD flow
Open source platform for the machine learning lifecycle
The "Command Line Interactive Controller for Kubernetes"
Accelerating network inference over video