Stars
Class notes for the course "Long Term Memory in AI - Vector Search and Databases" COS 597A @ Princeton Fall 2023
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
A hyper-fast local vector database for use with LLM Agents. Now accepting SAFEs at $135M cap.
Fast SHAP value computation for interpreting tree-based models
The release of the Twitter algorithm, annotated for recsys
Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask without any rewrites.
Statistical Rethinking Course for Jan-Mar 2023
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Neural Networks: Zero to Hero
by ex-googlers, for ex-googlers - a lookup table of similar tech & services
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Approximate Nearest Neighbor Search for Sparse Data in Python!
It is my belief that you, the postgraduate students and job-seekers for whom the book is primarily meant will benefit from reading it; however, it is my hope that even the most experienced research…
Coarse-grained lineage and tracing for machine learning pipelines.
A collection of (mostly) technical things every software developer should know about
A light-weight, flexible, and expressive statistical data testing library
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
Source code accompanying O'Reilly book: Machine Learning Design Patterns
Elegant easy-to-use neural networks + scientific computing in JAX. https://docs.kidger.site/equinox/
📌 Papers, guides, and mentor interviews on applying machine learning for ApplyingML.com—the ghost knowledge of machine learning.
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rath…
Automatically visualize your pandas dataframe via a single print! 📊 💡
Preparation links and resources for system design questions
📝 Design doc template & examples for machine learning systems (requirements, methodology, implementation, etc.)
State of the Art Natural Language Processing
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
A C++ standalone library for machine learning
PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf