Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Stars
DelinQu / SimplerEnv-OpenVLA
Forked from simpler-env/SimplerEnvEvaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo, and OpenVLA) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)
A Telegram bot to recommend arXiv papers
SEED-Voken: A Series of Powerful Visual Tokenizers
Open Source Implementation of Dual Modality MAGVIT2 Tokenizer
a family of versatile and state-of-the-art video tokenizers.
This repo contains the code for 1D tokenizer and generator
“FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching” FlowAR employs a simplest scale design and is compatible with any VAE.
Code for FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks
Stanford-ILIAD / openvla-mini
Forked from openvla/openvlaOpenVLA: An open-source vision-language-action model for robotic manipulation.
An official code repository for the paper "Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation"
Train transformer language models with reinforcement learning.
A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
[NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models
An elegant PyTorch deep reinforcement learning library.
Collect some World Models for Autonomous Driving papers.
Latent Motion Token as the Bridging Language for Robot Manipulation
A simple testbed for robotics manipulation policies
This is the official implementation of our ICML 2024 paper "MultiMax: Sparse and Multi-Modal Attention Learning""
[NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
[NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.