Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics
A suite of image and video neural tokenizers
MoVQGAN - model for the image encoding and reconstruction
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning
moojink / rlds_dataset_mod
Forked from kpertsch/rlds_dataset_modEfficiently apply modification functions to RLDS/TFDS datasets.
Paper list in the survey paper: Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
DROID Policy Learning and Evaluation
Official PyTorch implementation of TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer (ECCV 2022)
Blend Between Multiple Images in JupyterLab.
A sample html to compare two videos with slider animation using
Official inference repo for FLUX.1 models
Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA
This repo contains the code for 1D tokenizer and generator
Official repository for "AM-RADIO: Reduce All Domains Into One"
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
A PyTorch native library for large model training
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
A minimal, but effective implementation of CLIP (Contrastive Language-Image Pretraining) in PyTorch
Democratization of RT-2 "RT-2: New model translates vision and language into action"
[ICRA 2023] A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
MeshNet: Mesh Neural Network for 3D Shape Representation (AAAI 2019)
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model