-
The AI Institute
- Boston
-
16:40
(UTC -05:00) - https://www3.cs.stonybrook.edu/~jishang/
Starred repositories
Motion-Controllable Video Diffusion via Warped Noise
🔥[ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
ASCII generator (image to text, image to image, video to video)
Adaptive Caching for Faster Video Generation with Diffusion Transformers
[ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
[ECCV2022] [T-PAMI] StARformer: Transformer with State-Action-Reward Representations.
Code for NeurIPS 2023 paper "Active Vision Reinforcement Learning with Limited Visual Observability"
Environments for Active Vision Reinforcement Learning
PyTorch code and pretrained weights for the UNIC models.
Parallel t-SNE implementation with Python and Torch wrappers.
Assessing Sample Quality via the Latent Space of Generative Models (ECCV 2024)
Official repository for "AM-RADIO: Reduce All Domains Into One"
Inceptive Visual Representation Learning with Diverse Attention Across Heads. Image Classification, Action Recognition, and Robot Learning.
Language Repository for Long Video Understanding
[ICRA'24] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning
An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodiment
CoTracker is a model for tracking any point (pixel) on a video.
A simple and highly efficient RTS-game-inspired environment for reinforcement learning (formerly Gym-MicroRTS)
Official Code for PathLDM: Text conditioned Latent Diffusion Model for Histopathology (WACV 2024)