Structured Sparsity Inducing Adaptive Optimizers for Deep Learning

This is the repository for the paper

Tristan Deleu, Yoshua Bengio, Structured Sparsity Inducing Adaptive Optimizers for Deep Learning [ArXiv]

This repository contains:

The weighted and unweighted proximal operators for the l1/l2 and group MCP penalties
A modification of AdamW from Hugging Face's transformers library to include a proximal step, compatible with the structured sparsity inducing penalties in this repository.
The definition of the groups (channel-wise & row-wise) for some Deep Learning architectures (VGG, Resnet, BERT).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
proxssi		proxssi
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback