Qwen2.5-7B-Instruct Quantization Tool

Version 1.0

Creator: Juhani Merilehto - @juhanimerilehto - Jyväskylä University of Applied Sciences (JAMK), Likes institute

Overview

A comprehensive Windows-optimized tool for downloading, converting, and quantizing the Qwen2.5-7B-Instruct model to GGUF format. This project provides a structured approach to model quantization with separate scripts for each phase of the process, specifically designed for Windows environments.

Enables offline, privacy-preserving analysis of i.e., sports, exercise, or organization-related data through local LLM deployment, without sending sensitive data to external servers.

Features

Phased Approach: Separate scripts for download, conversion, quantization, and inference testing
Progress Tracking: Detailed progress bars and status updates
Error Handling: Robust error handling and validation at each step
Flexible Quantization: Support for multiple quantization formats (Q4_K_M, Q4_K_S, Q4_0)
Resource Management: Optimized for consumer hardware (tested on RTX 4070 12GB)
Windows Support: Full Windows compatibility with proper path handling
Inference Testing: Built-in testing capabilities with the quantized model

Hardware Requirements

GPU: NVIDIA GPU with at least 12GB VRAM (tested on RTX 4070)
RAM: 32GB recommended
Storage: 50GB free space for model files and intermediate formats
CUDA: CUDA 11.7 or higher
OS: Windows 10/11 with Visual Studio 2022 Build Tools

Project Structure

llm-quantization-gguf/
├── assets/
│   └── likes_str_logo.png
├── llama.cpp/           # llama.cpp repository and builds
├── models/             # Model storage directory
├── build.ps1           # Build script for llama.cpp
├── convert-script.py   # GGUF conversion script
├── download-script.py  # Model download script
├── quantize-script.py  # Quantization script
├── test-model.py      # Inference testing script
├── requirements.txt
└── README.md

Installation

1. Clone the repository:

git clone [your-repo-url]
cd llm-quantization-gguf

2. Create a virtual environment:

python -m venv llm-env
.\llm-env\Scripts\activate

3. Install dependencies:

pip install -r requirements.txt

4. Install required tools:

# Install Visual Studio 2022 Build Tools with C++ support
winget install Microsoft.VisualStudio.2022.BuildTools --override "--add Microsoft.VisualStudio.Workload.VCTools --includeRecommended"

# Install CMake
winget install Kitware.CMake

# Install NVIDIA CUDA Toolkit
winget install Nvidia.CUDA

# Alternative manual installation:
# 1. Visit: https://developer.nvidia.com/cuda-downloads
# 2. Select: Windows > x86_64 > 11.7 or higher
# 3. Select your preferred installer type (network or local)
# 4. Follow the installation wizard
# 5. Restart your computer after installation

# Verify CUDA installation
nvcc --version

Usage

1. Build llama.cpp:

.\build.ps1

2. Download the model:

python download-script.py

3. Convert to GGUF:

python convert-script.py

4. Quantize:

python quantize-script.py

5. Test the model:

python test-model.py --prompt "Write a creative story about an AI."

Supported Quantization Formats

Q4_K_M: High quality, ~4.4GB (recommended)
Q4_K_S: Balanced quality and size, ~3.8GB
Q4_0: Smallest size, ~3.5GB

Actual Processing Times (RTX 4070)

Download: ~10-15 minutes (internet speed dependent)
Conversion to GGUF: ~10 minutes
Quantization (Q4_K_M): ~3 minutes
Model Sizes:
- Original F16: 14.19 GB
- Q4_K_M: 4.36 GB

Performance Notes

GPU Utilization: Uses 35 layers on GPU for optimal performance
Context Window: 4096 tokens supported
Thread Count: Automatically optimized for your CPU
Temperature: 0.7 default for balanced creativity
Top-p: 0.9 for diverse but focused responses

Credits

Juhani Merilehto (@juhanimerilehto) – Specialist, Data and Statistics
JAMK Likes – Organization sponsor

License

This project is licensed for free use under the condition that proper credit is given to Juhani Merilehto (@juhanimerilehto) and JAMK Likes institute.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen2.5-7B-Instruct Quantization Tool

Creator: Juhani Merilehto - @juhanimerilehto - Jyväskylä University of Applied Sciences (JAMK), Likes institute

Overview

Features

Hardware Requirements

Project Structure

Installation

1. Clone the repository:

2. Create a virtual environment:

3. Install dependencies:

4. Install required tools:

Usage

1. Build llama.cpp:

2. Download the model:

3. Convert to GGUF:

4. Quantize:

5. Test the model:

Supported Quantization Formats

Actual Processing Times (RTX 4070)

Performance Notes

Credits

License

About

Releases 2

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
.gitignore.txt		.gitignore.txt
README.md		README.md
build.ps1		build.ps1
convert-script.py		convert-script.py
download-script.py		download-script.py
quantize-script.py		quantize-script.py
requirements.txt		requirements.txt
test-model.py		test-model.py

juhanimerilehto/llm-quantization-gguf

Folders and files

Latest commit

History

Repository files navigation

Qwen2.5-7B-Instruct Quantization Tool

Creator: Juhani Merilehto - @juhanimerilehto - Jyväskylä University of Applied Sciences (JAMK), Likes institute

Overview

Features

Hardware Requirements

Project Structure

Installation

1. Clone the repository:

2. Create a virtual environment:

3. Install dependencies:

4. Install required tools:

Usage

1. Build llama.cpp:

2. Download the model:

3. Convert to GGUF:

4. Quantize:

5. Test the model:

Supported Quantization Formats

Actual Processing Times (RTX 4070)

Performance Notes

Credits

License

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages