Skip to content

Adapted version of llama3.np (NumPy) to a CuPy implementation for the Llama 3 model.

License

Notifications You must be signed in to change notification settings

Chivier/llama3.cp.mfu

 
 

Repository files navigation

llama3.cp

llama3.cp is an adapted version of llama3.np (pure NumPy) that utilizes CuPy for GPU acceleration. The original implementation, written in pure NumPy, was created by likejazz.

Usage

python llama3_gpu.py "I have a dream"
"""
I have a dream. He dream of a big, beautiful garden full of flower and tree. He dream of playing with hi friend and eating yummy snack.
One day, he wa walking in the garden when he saw

Token count: 50, elapsed: 0.89s, 56 tokens/s
"""

On my machine, the original implementation performed as follows:

python llama3.py "I have a dream"
"""
I have a dream. He dream of a big, beautiful garden full of flower and tree. He dream of playing with hi friend and eating yummy snack.
One day, he wa walking in the garden when he saw

Token count: 50, elapsed: 2.08s, 24 tokens/s
"""

Relevant aspects

Changing from NumPy to CuPy, the performance is improved more than 2 times. Despite this, the performance can vary depending on the Hardware. The results presented here were achieved using an RTX 4090.

Installation

pip install -r requirements.txt

Observations

Depending on your CUDA version, you must modify the requirements.txt file to install the appropriate version of CuPy. For instance, if you use CUDA 11, you should change the line cupy-cuda12x to cupy-cuda11x.

License

MIT

About

Adapted version of llama3.np (NumPy) to a CuPy implementation for the Llama 3 model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%