Skip to content
View Hannibal046's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report Hannibal046

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Python 209 22 Updated Oct 25, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,045 1,043 Updated Dec 26, 2024

Code of "Recurrent Transformers Trade-off Parallelism for Length Generalization on Regular Languages"

Python 7 1 Updated Nov 13, 2024

Train, tune, and infer Bamba model

Python 67 11 Updated Dec 20, 2024
Python 89 4 Updated Dec 23, 2024

Fast inference from large lauguage models via speculative decoding

Python 615 63 Updated Aug 22, 2024

Bringing BERT into modernity via both architecture changes and scaling

Python 838 44 Updated Dec 21, 2024

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 274 18 Updated Dec 20, 2024

Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense f…

Python 79 4 Updated Dec 12, 2024

Code for BLT research paper

Python 1,202 83 Updated Dec 12, 2024

Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.

Python 592 49 Updated Dec 29, 2024

Text-to-image search with OpenCLIP, Docker, Flask, Faiss, etc. and a basic front-end.

Python 4 Updated Apr 27, 2024

Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 57 6 Updated Dec 31, 2024

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 240 14 Updated Aug 31, 2024

Friends of OLMo and their links.

221 14 Updated Dec 15, 2024

scalable and robust tree-based speculative decoding algorithm

Python 324 37 Updated Aug 13, 2024

📰 Must-read papers and blogs on Speculative Decoding ⚡️

533 25 Updated Dec 30, 2024

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,362 164 Updated Jun 25, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 6,937 635 Updated Dec 31, 2024

Natural Language Reinforcement Learning

Python 64 7 Updated Dec 19, 2024
Python 187 15 Updated Dec 22, 2024

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

Python 887 91 Updated Dec 30, 2024

A debugging and profiling tool that can trace and visualize python code execution

Python 5,611 407 Updated Dec 5, 2024

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 8,848 633 Updated Dec 31, 2024

Open source platform for the machine learning lifecycle

Python 19,120 4,299 Updated Dec 31, 2024

The official code for paper "parallel speculative decoding with adaptive draft length."

Python 30 1 Updated Aug 23, 2024
Python 137 10 Updated Dec 11, 2024
Next