-
Xiamen University
- https://shaojieli.github.io
Stars
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
⚡ InstaFlow! One-Step Stable Diffusion with Rectified Flow (ICLR 2024)
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Official inference repo for FLUX.1 models
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
High-Resolution Image Synthesis with Latent Diffusion Models
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
Nightly release of ControlNet 1.1
Generative Models by Stability AI
Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA
A latent text-to-image diffusion model
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Speed up Stable Diffusion with this one simple trick!
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
GPT4V-level open-source multi-modal model based on Llama3-8B
This repository collects research papers of large Vision Language Models in Autonomous driving and Intelligent Transportation System. The repository will be continuously updated to track the lates…
Collection of papers on state-space models
✨✨Latest Advances on Multimodal Large Language Models
A curated list of awesome LLM for Autonomous Driving resources (continually updated)
A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".
[T-ITS] Driving Behavior Modeling using Naturalistic Human Driving Data with Inverse Reinforcement Learning
A 3D computer vision development toolkit based on PaddlePaddle. It supports point-cloud object detection, segmentation, and monocular 3D object detection models.