Starred repositories
The only reliable agent framework built on top of the latest OpenAI Assistants API.
A system for agentic LLM-powered data processing and ETL
A repository to capture my experiments using LLMs and Langgraph for AgenticAI
Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go
LOTUS: A semantic query engine for fast and easy LLM-powered data processing
Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".
FlockMTL: DuckDB extension to seamlessly combine analytics and semantic analysis using language models (LMs)
A machine learning compiler for GPUs, CPUs, and ML accelerators
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
A repository of simple Python examples for use with the PLEXOS API
A super fast Graph Database uses GraphBLAS under the hood for its sparse adjacency matrix graph representation. Our goal is to provide the best Knowledge Graph for LLM (GraphRAG).
Resources on the GraphBLAS standard for graph algorithms in the language of linear algebra
A code-first agent framework for seamlessly planning and executing data analytics tasks.
Compare DuckDB, Polars and Pandas for generating an artificial dataset of persons and companies
A massively parallel, high-level programming language
Build high-performance AI models with modular building blocks
FlashInfer: Kernel Library for LLM Serving
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Comparing performance-oriented string-processing libraries for substring search, multi-pattern matching, hashing, and Levenshtein edit-distance calculations
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖
Simplify code execution with Open Interpreter UI Project with Streamlit. A user-friendly GUI for Python, JavaScript, and more. Pay-as-you-go, no subscriptions. Ideal for beginners.
[ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear attention mechanism.
A Python library modeling pedestrian and bicycle trips over networks.
Streamlit text input that returns value on keyup
just a bunch of useful embeddings for scikit-learn pipelines
Track emissions from Compute and recommend ways to reduce their impact on the environment.
Public Fused UDFs. Build any scale workflows with the Fused Python SDK and Workbench webapp, and integrate them into your stack with the Fused Hosted API.
Code for the AISTATS 2024 Paper "From Data Imputation to Data Cleaning - Automated Cleaning of Tabular Data Improves Downstream Predictive Performance"
Easily embed, cluster and semantically label text datasets