Starred repositories
Make websites accessible for AI agents
Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured tr…
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
A Notebook with Flexible Customization and Easy Integration.
🤖 GPT Vision, Open Source Vision components for GPTs, generative AI, and LLM projects. Not only UI Components.
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
A lightweight library for portable low-level GPU computation using WebGPU.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen2.5, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3…
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Fine-tune SAM (Segment Anything Model) for computer vision tasks such as semantic segmentation, matting, detection ... in specific scenarios
The production-scale datacenter profiler (C/C++, Go, Rust, Python, Java, NodeJS, .NET, PHP, Ruby, Perl, ...)
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Using Low-rank adaptation to quickly fine-tune diffusion models.
Pure JS implementation of the HTML Canvas 2D drawing API
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Stable Diffusion and Flux in pure C/C++
ML Records in 1110 Lab of BUPT. Some detailed information can be referenced on: https://mathpretty.com/10388.html