
-
Zhejiang University
- Hangzhou, China
- @LinzhanMou
Highlights
Lists (3)
Sort Name ascending (A-Z)
Stars
📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.
Official implementation of TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
No fortress, purely open ground. OpenManus is Coming.
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
3D Gaussian Splatting (3DGS) extension for Omniverse
Wan: Open and Advanced Large-Scale Video Generative Models
[CVPR 2024] MemFlow: Optical Flow Estimation and Prediction with Memory
[CVPR 2025] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Official implementation of ICCV2023 VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
An ML research template with good documentation by Boyuan Chen, an MIT PhD student
Solve Visual Understanding with Reinforced VLMs
MoBA: Mixture of Block Attention for Long-Context LLMs
A debugging and profiling tool that can trace and visualize python code execution
https://huyenchip.com/ml-interviews-book/
[CVPR 2025] Official repository for “MagicArticulate: Make Your 3D Models Articulation-Ready”
Video Generation Foundation Models: https://saiyan-world.github.io/goku/
Stereo4D data processing pipeline
[CVPR 2024] Memory-based Adapters for Online 3D Scene Perception
Fillerbuster: Multi-View Scene Completion for Casual Captures
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
DELTA: Dense Efficient Long-range 3D Tracking for Any video (ICLR 2025)