Stable-Diffusion implemented by ncnn framework based on C++
Zhihu: https://zhuanlan.zhihu.com/p/582552276
Video: https://www.bilibili.com/video/BV15g411x7Hc
Performance (time pre-it and ram)
per-it | i7-12700 (512x512) | i7-12700 (256x256) | Snapdragon865 (256x256) |
---|---|---|---|
slow | 4.85s/5.24G(7.07G) | 1.05s/3.58G(4.02G) | 1.6s/2.2G(2.6G) |
fast | 2.85s/9.47G(11.29G) | 0.65s/5.76G(6.20G) |
2023-01-19: speed up & less ram in x86, dynamic shape in x86
2023-01-12: update to the latest ncnn code and use optimize model, update android, add memory monitor
2023-01-05: add 256x256 model to x86 project
2023-01-04: merge and finish the mha op in x86, enable fast gelu
All models and exe file you can download from 百度网盘 or Google Drive
- enter folder exe
- download three bin file:
AutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin
and put them toassets
folder - set up your config in
magic.txt
, each line are:- resolution (only support 256 and 512)
- speed mode (0 for slow but low ram, 1 for fast but high ram)
- step number (15 is noe bad)
- seed number (set 0 to be random)
- positive prompt
- negative prompt
- run
stable-diffusion.exe
- build and Install NCNN
- build the demo with CMake
cd x86/linux
mkdir -p build && cd build
cmake ..
make -j$(nproc)
- download three bin file:
AutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin
and put them tobuild/assets
folder - run the demo
./stable-diffusion-ncnn
- download an install the apk from the link
- in the top, the first one is step and the second one is seed
- int the bottom, the top one the positive prompt and the bottom one negative prompt (set empty to enable the default prompt)
- note: the apk needs 7G ram, and run very slow and power consumption
Note: Please comply with the requirements of the SD model and do not use it for illegal purposes
- Three main steps of Stable-Diffusion:
- CLIP: text-embedding
- iterative sampling with sampler
- decode the sampler results to obtain output images
- Model details:
- Weights:Naifu (u know where to find)
- Sampler:Euler ancestral (k-diffusion version)
- Resolution:dynamic shape, but must be a multiple of 128, minimum is 256
- Denoiser:CFGDenoiser, CompVisDenoiser
- Prompt:positive & negative, both supported :)
- download three bin file:
AutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin
and put them toassets
folder - open the vs2019 project and compile the release&x64
- build and Install NCNN
- build the demo with CMake
cd x86/linux
mkdir -p build && cd build
cmake ..
make -j$(nproc)
- download three bin file:
AutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin
and put them tobuild/assets
folder - run the demo
./stable-diffusion-ncnn
- download three bin file:
AutoencoderKL-fp16.bin, FrozenCLIPEmbedder-fp16.bin, UNetModel-MHA-fp16.bin
and put them toassets
folder - open android studio and run the project
I've uploaded the three onnx models used by Stable-Diffusion, so that you can do some interesting work.
You can find them from the link above.
- Please abide by the agreement of the stable diffusion model consciously, and DO NOT use it for illegal purposes!
- If you use these onnx models to make open source projects, please inform me and I'll follow and look forward for your next great work :)
- FrozenCLIPEmbedder
ncnn (input & output): token, multiplier, cond, conds
onnx (input & output): onnx::Reshape_0, 2271
z = onnx(onnx::Reshape_0=token)
origin_mean = z.mean()
z *= multiplier
new_mean = z.mean()
z *= origin_mean / new_mean
conds = torch.concat([cond,z], dim=-2)
- UNetModel
ncnn (input & output): in0, in1, in2, c_in, c_out, outout
onnx (input & output): x, t, cc, out
outout = in0 + onnx(x=in0 * c_in, t=in1, cc=in2) * c_out