Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Chapter 03: Data analysis

Usually involves model comparasion.

v2a/img/pony1_unet_vg.png

Extra: Visualizing UNET

  • Tired of diagrams drawn by hand? How about generated solely by program? view_unet.ipynb

SD1: stabilityai/stable-diffusion-v1-5

sd1_unet_marked.png

SD2: stabilityai/stable-diffusion-2-1

sd2_unet_marked.png

SDXL: stabilityai/stable-diffusion-xl-base-1.0

sdxl_unet_marked.png

SD3: stabilityai/stable-diffusion-3-medium-diffusers

sd3_mmdit_marked.png

SD3.5: stabilityai/stable-diffusion-3.5-large

sd35_mmdit_marked.png

Hunyuan-DiT: Tencent-Hunyuan/HunyuanDiT-Diffusers

hy_mmdit_marked.png

AuraFlow: fal/AuraFlow-v0.2

af_mmdit_marked.png

Flux: black-forest-labs/FLUX.1-dev CPU + 80GB RAM only. 31 minutes

flux_mmdit_marked.png

Extra: Model comparasion in size and "MBW" layers

  • Enjoy the comparasion. Actual VRAM requirement is different, maybe Total Size (GB) x 0.5 x (image size / 1024) + TEs.

  • Also there are no implied image sizes, the "height / width" in the model is already counted as latent space. It will cause so much confusion therefore I'll try to intercept the input from the diffuser pipeline to the actual model component, which should match the public docuements from the model authors.

  • "MBW layers" is an unit of "funcional layers", according to the concept of "MBW merge" which was the meta of merging SD1.5 models obviously not working since then. However it is still useful to have a feeling of how the model works, from UNET to DiT.

Inconsistent observed parameter counts between running instance and official claim

  • From the inconsistint result of ["sdxl", "sd1", "sd2"] which was overestimated for 2.37x (others are < 0.1%), I also implemented diffusers and torch native approach based from this Stackoverflow post. Issues #262, #303, #312 were reported in torchinfo, which made me a bit panic. Hopefully it can be justified from future inconsistent results.
  • Refer diffusers.num_parameters and its code, nn.Parameter, torch.numel for how it is counted. It is very likely MISMATCH for other contents (e.g. torchvision and torchinfo here, refered as "model summary" )
  • The "2.6b", "860M" and "865M" counts are matching the official claim.
  • Meanwhile, RTX 3090 is barely capable for flux FP16 for torchinfo.
Model MBW Layers Params (b, torchinfo) Params (b, diffusers) Forward/backward pass size (MB, FP16) Estimated Total Size (GB, FP16)
SD1 25 2.0 0.860 1265 2.91
SD2 25 2.1 0.865 2837 4.46
SDXL 19 5.3 2.6 8993 13.80
SD3 24 2.0 2.0 11127 18.79
SD3.5 38 8.0 8.0 17010 32.35
Hunyuan-DiT 40 1.5 1.5 17595 20.12
AuraFlow 36 6.8 6.8 39974 52.38
Flux 57 11.91 11.91 31557 54.06

Extra: PCA over CLIP embeddings

  • As mini side request: view_clip.ipynb

  • More like validate my thought instead of discovery.

result_24042103.jpg