Stars
Python tool for converting files and office documents to Markdown.
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
A hands-on introduction to video technology: image, video, codec (av1, vp9, h265) and more (ffmpeg encoding). Translations: 🇺🇸 🇨🇳 🇯🇵 🇮🇹 🇰🇷 🇷🇺 🇧🇷 🇪🇸
Official inference repo for FLUX.1 models
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.
3D Point Cloud Annotation Platform for Autonomous Driving
[NeurIPS 2023] Official code of "One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization"
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 100+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Inter…
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, for text-to-image genera…
fast-stable-diffusion + DreamBooth
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
[NeurIPS2024] Tune your restoration model with one 3090 GPU!
[ECCV 2024] InstructIR: High-Quality Image Restoration Following Human Instructions https://huggingface.co/spaces/marcosv/InstructIR
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
OpenMMLab's next-generation platform for general 3D object detection.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline