GitHub - NUS-HPC-AI-Lab/SpeeD at 33865bddc506bad9c1bba10e4f051daea05f52f0

Name	Name	Last commit message	Last commit date
Latest commit History 21 Commits
configs	configs
docs	docs
evaluations	evaluations
runner	runner
speedit	speedit
tools	tools
visuals	visuals
.gitignore	.gitignore
.isort.cfg	.isort.cfg
.pre-commit-config.yaml	.pre-commit-config.yaml
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE.txt	LICENSE.txt
README.md	README.md
environment.yml	environment.yml
main.py	main.py
requirements.txt	requirements.txt
setup.py	setup.py

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

If you like SpeeD, please give us a star ⭐ on GitHub for the latest update.

Authors

Kai Wang², Yukun Zhou^1,2, Mingjia Shi², Zhihang Yuan³, Yuzhang Shang⁴, Xiaojiang Peng*¹, Hanwang Zhang⁵, Yang You²
¹Shenzhen Technology University, ²National University of Singapore, ³Infinigence-AI, ³Illinois Institute of Technology, and ⁵Nanyang Technological University [Kai, Yukun, and Mingjia contribute equally to this work.]

Elevator roadshow of SpeeDiT

We propose a general diffusion training acceleration algorithm that employs asymmetric sampling of time steps, named SpeeDiT. It can speed up DiT by 3.3 times without a decrease in FID. Ongoing experiments demonstrate that SpeeDiT can be applied to multiple diffusion-based visual generation tasks and has good compatibility with other acceleration methods. Therefore, we believe SpeeDiT can significantly reduce the cost of diffusion training, allowing more people to benefit from this exciting technological advancement!

TODO list sorted by priority

If you encounter any inconvenience with the code or have suggestions for improvements, please feel free to contact us via email at [email protected] and [email protected].

Releasing SpeeDiT-XL/2 400K, 1000K, ..., 7000K checkpoints and publish the technical report.
Upgrading the components of SpeeDiT
Applying SpeeDiT to text2image

[Stable diffusion]

[Latent Diffusion]

[Imagen]
Applying SpeeDiT to text2video

[Open-Sora]

[Latte]
SpeeDiT + MDT
More tasks (Image inpainting, 3D Generation)

😮 Highlights

Our method, which is easily compatible, can accelerate the training of diffusion model.

✒️ Motivation

Inspired by the uphill and downhill diffusion processes in physics. The following GIF illustrates the commonalities between image diffusion and electron diffusion. The left figure of electric diffusion is simulated from PhET/diffusion. The right figure is downloaded from OpenAI website.

Visualization of different phases of reverse process and uphill diffusion. For easy understanding, we assume that the direction of electronic velocity only has two cases: ⬅️ and ➡️.

🔆 Method

We use the sampling and weighting strategy which are simple and easily compatible to achieve the acceleration. The following is the core code SpeeDiT/speedit/diffusion/iddpm/speed.py ,

class SpeeDiffusion(SpacedDiffusion):
    def __init__(self, faster, **kwargs):
        super().__init__(**kwargs)
        self.faster = faster
        if faster:
            grad = np.gradient(self.sqrt_one_minus_alphas_cumprod)

            # set the meaningful steps in diffusion, which is more important in inference
            self.meaningful_steps = np.argmax(grad < 1e-4) + 1

            # p2 weighting from: Perception Prioritized Training of Diffusion Models
            self.p2_gamma = 1
            self.p2_k = 1
            self.snr = 1.0 / (1 - self.alphas_cumprod) - 1
            sqrt_one_minus_alphas_bar = torch.from_numpy(self.sqrt_one_minus_alphas_cumprod)
            # sample more meaningful step
            p = torch.tanh(1e6 * (torch.gradient(sqrt_one_minus_alphas_bar)[0] - 1e-4)) + 1.5
            self.p = F.normalize(p, p=1, dim=0)
            self.weights = self._weights()
        else:
            self.meaningful_steps = self.num_timesteps

    def _weights(self):
        # process where all noise to noisy image with content has more weighting in training
        # the weights act on the mse loss
        weights =  1 / (self.p2_k + self.snr) ** self.p2_gamma
        weights = weights
        return weights

    # get the weights and sampling t in training diffusion
    def t_sample(self, n, device):
        if self.faster:
            t = torch.multinomial(self.p, n // 2 + 1, replacement=True).to(device)
            # dual sampling, which can balance the step multiple task training
            dual_t = torch.where(t < self.meaningful_steps, self.meaningful_steps - t, t - self.meaningful_steps)
            t = torch.cat([t, dual_t], dim=0)[:n]
            weights = self.weights
        else:
            # if
            t = torch.randint(0, self.num_timesteps, (n,), device=device)
            weights = None

        return t, weights

You can enable our acceleration module with diffusion.faster=True.

# config file
diffusion:
    timestep_respacing: '250'
    faster: true  #enabl module for training acceleration

🛠️ Requirements and Installation

This code base does not use hardware acceleration technology, experimental environment is not complicated.

You can create a new conda environment:

conda env create -f environment.yml
conda activate speedit

or install the necessary package by:

pip install -r requirements.txt

If necessary, we will provide more methods (e.g., docker) to facilitate the configuration of the experimental environment.

🗝️ Implementation

We provide a complete process for generating tasks including training, inference and test. The current code is only compatible with class-conditional image generation tasks. We will be compatible with more generation tasks about diffusion in the future.

We refactor the facebookresearch/DiT code and loaded the configs using OmegaConf . The configuration file loading rule is recursive for easier argument modification. Simply put, the file in the latter path will override the previous setting of base.yaml.

You can modify the experiment setting by modifying the config file and the command line. More details about the reading of config are written in configs/README.md.

For each experiment, you must provide two arguments by command,

-c: config path;
-p: phase including ['train', 'inference', 'sample'].

Train & inference

For example, class-conditional image generation task with 256x256 ImageNet dataset and DiT-XL/2 models.

# Training: training diffusion and saving checkpoints
torchrun --nproc_per_node=8 main.py -c configs/image/imagenet_256/base.yaml -p train
# inference: generating samples for testing
torchrun --nproc_per_node=8 main.py -c configs/image/imagenet_256/base.yaml -p inference
# sample: sample some images for visualization
python main.py -c configs/image/imagenet_256/base.yaml -p sample

How to do ablation?

You can modify the experiment setting by modifying the config file and the command line. More details about the reading of config are written in configs/README.md.

For example, change the classifier-free guidance scale in sampling by command line:

python main.py -c configs/image/imagenet_256/base.yaml -p sample guidance_scale=1.5

Test

Test the generation tasks require the results of inference. The more details about testing in evaluations.

👍 Acknowledgement

We are grateful for the following exceptional work and generous contribution to open source.

DiT: Scalable Diffusion Models with Transformers.
Open-Sora : Open-Sora: Democratizing Efficient Video Production for All
OpenDiT: An acceleration for DiT training. We adopt valuable acceleration strategies for training progress from OpenDiT.

🔒 License

The majority of this project is released under the Apache 2.0 license as found in the LICENSE file.

✏️Citation

If you find our code useful in your research, please consider giving a star ⭐ and citation 📝.

@software{speedit,
  author = {Yukun Zhou, Kai Wang, Hanwang Zhang, Yang You and Xiaojiang Peng},
  title = {SpeeDiT: Accelerating DiTs and General Diffusion Models via Principle Timestep Adjustment Training},
  month = {March},
  year = {2024},
  url = {https://github.com/1zeryu/SpeeD}
}

@article{speed,
author ={Kai Wang,Yukun Zhou,Mingjia Shi,Zhihang Yuan,Yuzhang Shang,Xiaojiang Peng,Hanwang Zhang,Yang You},
title = {A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training},
year ={2024},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

If you like SpeeD, please give us a star ⭐ on GitHub for the latest update.

Authors

Elevator roadshow of SpeeDiT

TODO list sorted by priority

😮 Highlights

✒️ Motivation

🔆 Method

🛠️ Requirements and Installation

🗝️ Implementation

Train & inference

Test

👍 Acknowledgement

🔒 License

✏️Citation

About

Releases

Packages

Contributors 4

Languages

License

NUS-HPC-AI-Lab/SpeeD

Folders and files

Latest commit

History

Repository files navigation

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

If you like SpeeD, please give us a star ⭐ on GitHub for the latest update.

Authors

Elevator roadshow of SpeeDiT

TODO list sorted by priority

😮 Highlights

✒️ Motivation

🔆 Method

🛠️ Requirements and Installation

🗝️ Implementation

Train & inference

Test

👍 Acknowledgement

🔒 License

✏️Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages