Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Introduction

We introduce Lookahead Decoding,

Installation

Install from the source

git clone https://github.com/hao-ai-lab/LookaheadDecoding.git
cd LookaheadDecoding
pip install -r requirements.txt
pip install -e .

Inference With Lookahead decoding

You can run the minimal example to see the speedup that Lookahead decoding brings.

python minimal.py #no Lookahead decoding
USE_LADE=1 LOAD_LADE=1 python minimal.py #use Lookahead decoding, 1.6x speedup

You can also enjoy chatting with your own chatbots with Lookahead decoding.

USE_LADE=1 python applications/chatbot.py  --model_path meta-llama/Llama-2-7b-chat-hf --debug --chat #chat, with lookahead 
USE_LADE=0 python applications/chatbot.py  --model_path meta-llama/Llama-2-7b-chat-hf --debug --chat #chat, without lookahead


USE_LADE=1 python applications/chatbot.py  --model_path meta-llama/Llama-2-7b-chat-hf --debug #no chat, with lookahead
USE_LADE=0 python applications/chatbot.py  --model_path meta-llama/Llama-2-7b-chat-hf --debug #no chat, without lookahead

Use Lookahead decoding in your own code

You can import and use Lookahead decoding in your own code in three LoCs. You also need to set USE_LADE=1 in command line or set os.environ["USE_LADE"]="1" in Python script. Note that Lookahead decoding only support LLaMA and Greedy Search yet.

import lade
lade.augment_all()
lade.config_pading(LEVEL=5, WINDOW_SIZE=7, GUESS_SET_SIZE=7, DEBUG=0)

Citation

@misc{fu2023lookahead,
    title = {Breaking the Sequential Dependency of LLM Inference Using Lookahead Decoding},
    url = {https://lmsys.org/blog/2023-11-21-lookahead-decoding/},
    author = {Yichao Fu and Peter Bailis and Ion Stoica and Hao Zhang},
    month = {November},
    year = {2023}
}

Guidance

The core implementation is in decoding.py. Lookahead decoding requires an adaptation for each specific model. An example is in models/llama.py.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
applications		applications
lade		lade
README.md		README.md
minimal.py		minimal.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Introduction

Contents

Installation

Install from the source

Inference With Lookahead decoding

Use Lookahead decoding in your own code

Citation

Guidance

About

Releases

Packages

Languages

nd7141/LookaheadDecoding

Folders and files

Latest commit

History

Repository files navigation

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Introduction

Contents

Installation

Install from the source

Inference With Lookahead decoding

Use Lookahead decoding in your own code

Citation

Guidance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages