ggml

Tensor library for machine learning

Note that this project is under active development.
Some of the development is currently happening in the llama.cpp and whisper.cpp repos

Features

Low-level cross-platform implementation
Integer quantization support
Broad hardware support
Automatic differentiation
ADAM and L-BFGS optimizers
No third-party dependencies
Zero memory allocations during runtime

Build

git clone https://github.com/ggml-org/ggml
cd ggml

# install python dependencies in a virtual environment
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# build the examples
mkdir build && cd build
cmake ..
cmake --build . --config Release -j 8

GPT inference (example)

# run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2-backend -m models/gpt-2-117M/ggml-model.bin -p "This is an example"

For more information, checkout the corresponding programs in the examples folder.

Using CUDA

# fix the path to point to your CUDA compiler
cmake -DGGML_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.1/bin/nvcc ..

Using hipBLAS

cmake -DCMAKE_C_COMPILER="$(hipconfig -l)/clang" -DCMAKE_CXX_COMPILER="$(hipconfig -l)/clang++" -DGGML_HIP=ON

Using SYCL

# linux
source /opt/intel/oneapi/setvars.sh
cmake -G "Ninja" -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL=ON ..

# windows
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
cmake -G "Ninja" -DCMAKE_C_COMPILER=cl -DCMAKE_CXX_COMPILER=icx -DGGML_SYCL=ON ..

Compiling for Android

Download and unzip the NDK from this download page. Set the NDK_ROOT_PATH environment variable or provide the absolute path to the CMAKE_ANDROID_NDK in the command below.

cmake .. \
   -DCMAKE_SYSTEM_NAME=Android \
   -DCMAKE_SYSTEM_VERSION=33 \
   -DCMAKE_ANDROID_ARCH_ABI=arm64-v8a \
   -DCMAKE_ANDROID_NDK=$NDK_ROOT_PATH
   -DCMAKE_ANDROID_STL_TYPE=c++_shared

# create directories
adb shell 'mkdir /data/local/tmp/bin'
adb shell 'mkdir /data/local/tmp/models'

# push the compiled binaries to the folder
adb push bin/* /data/local/tmp/bin/

# push the ggml library
adb push src/libggml.so /data/local/tmp/

# push model files
adb push models/gpt-2-117M/ggml-model.bin /data/local/tmp/models/

adb shell
cd /data/local/tmp
export LD_LIBRARY_PATH=/data/local/tmp
./bin/gpt-2-backend -m models/ggml-model.bin -p "this is an example"

Name	Name	Last commit message	Last commit date
Latest commit ggerganov sync : whisper.cpp Apr 29, 2025 0eac928 · Apr 29, 2025 History 1,985 Commits
.github/workflows	.github/workflows	ci : fix workflow name	Feb 27, 2025
ci	ci	ci: disable test-opt for now (#1158 )	Mar 26, 2025
cmake	cmake	ggml : sync/merge cmake,riscv,powerpc, add common.cmake (#0)	Mar 27, 2025
docs	docs	gguf.md: naming convention synced to llama.cpp (#896 )	Jul 22, 2024
examples	examples	ggml : add bilinear upscale support (#1185 )	Apr 9, 2025
include	include	rpc : add RPC_CMD_HELLO (llama/12955)	Apr 24, 2025
scripts	scripts	sync : whisper.cpp	Apr 29, 2025
src	src	cuda : fix unused variable compile warning (whisper/0)	Apr 29, 2025
tests	tests	CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (llama/13014)	Apr 24, 2025
.editorconfig	.editorconfig	gguf : add file format specification (#302 )	Nov 1, 2023
.gitignore	.gitignore	files : remove make artifacts	Dec 3, 2024
.gitmodules	.gitmodules	Create .gitmodules for the kompute backend (#1024 )	Nov 20, 2024
AUTHORS	AUTHORS	authors : update	Feb 4, 2025
CMakeLists.txt	CMakeLists.txt	ggml : add SSE 4.2 and x64 base variant for CPUs without AVX (llama/1…	Apr 24, 2025
CONTRIBUTING.md	CONTRIBUTING.md	Create CONTRIBUTING.md (#1146 )	Mar 13, 2025
LICENSE	LICENSE	license : update copyright notice + add AUTHORS	Apr 9, 2024
README.md	README.md	readme : remove transfer notice (#1107 )	Feb 8, 2025
ggml.pc.in	ggml.pc.in	pkg-config: Use CMake install paths for lib, include (#1133 )	Mar 6, 2025
requirements.txt	requirements.txt	ci : update requirements.txt	Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ggml

Features

Build

GPT inference (example)

Using CUDA

Using hipBLAS

Using SYCL

Compiling for Android

Resources

About

Releases

Packages

Contributors 382

Languages

License

ggml-org/ggml

Folders and files

Latest commit

History

Repository files navigation

ggml

Features

Build

GPT inference (example)

Using CUDA

Using hipBLAS

Using SYCL

Compiling for Android

Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 382

Languages

Packages