vision:
fla-zoo
currently supports vision encoders. A simple documentation is in here.video:
fla-zoo
currently supports video understanding models. Documentation is in progress.
Requirements:
- All the dependencies shown here
- torchvision
- diffusers
For example, you can install all the dependencies using the following command:
conda create -n flazoo python=3.12
conda activate flazoo
pip install torch torchvision accelerate diffusers timm
pip install transformers datasets evaluate causal_conv1d einops scikit-learn wandb
pip install flash-attn --no-build-isolation
pip install -U "huggingface_hub[cli]"
Now we can start cooking! 🚀
Note that as an actively developed repo, currently no released packages of fla-zoo
are provided. Use pip install -e .
to install the package in development mode.
-
[2025-03-02] A pilot version of Native Sparse Attention (NSA) is added. More experiments should be conducted to test its performance.
-
[2025-02-23] Add LightNet for classification. Also, a pilot SFT training script for vision models is added, check it out in here.
-
[2025-02-20] Experiments evaluating the performance of vision models are in progress. Stay tuned!
-
[2025-01-25] This repo is created with some vision encoders.
- Write documentation for video models.
- Release training scripts for vision models.
- Add diffusion models to support image/video generation.