Starred repositories
Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A simple screen parsing tool towards pure vision based GUI agent
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…
A high-throughput and memory-efficient inference and serving engine for LLMs
Foundational Models for State-of-the-Art Speech and Text Translation
Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
This is a speech interaction system built on an open-source model, integrating ASR, LLM, and TTS in sequence. The ASR model is SenceVoice, the LLM models are QWen2.5-0.5B/1.5B, and there are three …
End-to-end stack for WebRTC. SFU media server and SDKs.
呼叫中心,智能外呼,大模型呼入机器人,大模型呼出机器人,客服系统,工单系统,开源呼叫中心系统,话务系统,智能外呼系统,智能电话外呼,呼叫中心系统,大模型客服,电话外呼,客服中心,在线客服,大模型呼叫中心,呼入机器人,大模型机器人,智能电话外呼,开源呼叫中心系统,电话外呼,在线客服,大模型callcenter,contactcenter,Call,IPCC,Customer Service,V…
A latent text-to-image diffusion model
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
Repository containing all necessary codes to get started on the SoccerNet Action Spotting challenge. This repository also contains several benchmark methods.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
LlamaIndex is the leading framework for building LLM-powered agents over your data.
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.