Stars
Making Long-Context LLM Inference 10x Faster and 10x Cheaper
Building blocks for foundation models.
My continuously updated Machine Learning, Probabilistic Models and Deep Learning notes and demos (2000+ slides) 我不间断更新的机器学习,概率模型和深度学习的讲义(2000+页)和视频链接
This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).
YaRN: Efficient Context Window Extension of Large Language Models
Code and data for "Lost in the Middle: How Language Models Use Long Contexts"
This repository contains PyTorch implementations of various random feature maps for dot product kernels.
Simply Numpy implementation of the FAVOR+ attention mechanism, https://teddykoker.com/2020/11/performers/
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Awesome LLM compression research papers and tools.
Code for the ICML 2023 paper: Machine Learning Force Fields with Data Cost Aware Training
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.
Windows compile of bitsandbytes for use in text-generation-webui.
Dynamically get the suggested clusters in the data for unsupervised learning.
QLoRA: Efficient Finetuning of Quantized LLMs
A playbook for systematically maximizing the performance of deep learning models.
yifan1130 / PLATON
Forked from QingruZhang/PLATONThis pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).