Stars
Foundation 3D ViT model for volumetric head CT
Memory-optimized training scripts for video models based on Diffusers
Data collection and evaluation framework for on-device agents
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
A unified 3D Transformer Pipeline for visual synthesis
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Diffusion Models in Medical Imaging (Published in Medical Image Analysis Journal)
A collection of resources and papers on Diffusion Models
Meditron is a suite of open-source medical Large Language Models (LLMs).
Pytorch implementation of VQGAN (Taming Transformers for High-Resolution Image Synthesis) (https://arxiv.org/pdf/2012.09841.pdf)
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
The code repository for examples in the O'Reilly book 'Generative Deep Learning' using Pytorch
Implementation of Nougat Neural Optical Understanding for Academic Documents
Official implementation of Würstchen: Efficient Pretraining of Text-to-Image Models
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT p…
Python Data Science Handbook: full text in Jupyter Notebooks
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection https://drive.google.com/drive/folders/1T35gqO7jIKNxC-gVA2YVOMdsL7PSqeAa?usp=sharing
An open source implementation of CLIP.