Awesome Model Optimization Techniques

This repository contains a curated list of different ways of optimization and compression techiniques for the Machine Learning Models.

Main Content

Model Compression and Architecture Optimization

List of all possible ways of optimization

Pruning : Removing redundant connections present in the architecture. Pruning involves cutting out unimportant weights (which are usually defined as weights with small absolute value).
- Unstructured Pruning
- Structured Pruning
Quantization: Quantization involves bundling weights together by clustering them or rounding them off so that the same number of connections can be represented using lesser amount of memory.
- Dynamic Quantization
- Static Quantization
- Quantization Aware training
ONNX conversion and ONNX Runtime
Distillation
coreML for mobile device
Neural Architecture Search (NAS)
Low-Rank Approximation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md