Stars
[CVPR 2024 Highlight๐ฅ] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
[ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Official implementation of FIFO-Diffusion: Generating Infinite Videos from Text without Training (NeurIPS 2024)
Official repository for "FiGVCL: Fine-Grained Benchmark and Method for Video Copy Localization"
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
๐ค Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
๐ A delightful community-driven (with 2,400+ contributors) framework for managing your zsh configuration. Includes 300+ optional plugins (rails, git, macOS, hub, docker, homebrew, node, php, pythonโฆ
Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).
๐ฅ๐ฅ๐ฅLatest Papers, Codes and Datasets on Vid-LLMs.
๐ฆ๐ Build context-aware reasoning applications
FranzKafkaYu / x-ui
Forked from vaxilu/x-uiLightweight Xray panel with multi-protocol and multi-user on the same port,supports English language and Telegram bot. Easy to use and easy to manage.
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
A web GUI client of Project V which supports VMess, VLESS, SS, SSR, Trojan, Tuic and Juicity protocols. ๐
Empowering everyone to build reliable and efficient software.
A hands-on introduction to video technology: image, video, codec (av1, vp9, h265) and more (ffmpeg encoding). Translations: ๐บ๐ธ ๐จ๐ณ ๐ฏ๐ต ๐ฎ๐น ๐ฐ๐ท ๐ท๐บ ๐ง๐ท ๐ช๐ธ
Authors official PyTorch implementation of the "Self-Supervised Video Similarity Learning" [CVPRW 2023]
Development and compilation setup for the book versions of MINIX (2.0.0 and 3.1.0) on QEMU
Original Minix 1 sources from the book "Operating Systems: Design and Implementation" 1st ed.