Stars
Fast and memory-efficient exact attention
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,同时支持语音识别转录、语音合成、字幕翻译。
这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。
批量为视频或者音频生成字幕,并可批量将字幕翻译成其它语言。这是一个客户端工具, 跨平台支持 mac 和 windows 系统, 支持百度,火山,deeplx, openai, deepseek, ollama 等多个翻译服务
批量为本地视频生成字幕文件,并可将字幕文件翻译成其它语言, 跨平台支持 window, mac 系统
Faster Whisper transcription with CTranslate2
MultiBot Chat 是一个基于 Streamlit 的多机器人聊天应用,支持多种大语言模型(LLM)API,包括 OpenAI、AzureOpenAI、ChatGLM、CoZe、Qwen、Ollama、XingHuo、DeepSeek、Moonshot、Yi 和 Groq。这个应用允许用户同时与多个 AI 聊天机器人进行对话,比较不同模型的回答,并进行群聊式的讨论。
💬 OpenAI Assistants API chat UI 🛠️ It works easily by setting the ASSISTANT ID 📁 Supports file upload and file download 🏃 Supports Streaming API 🪟 Support to Azure OpenAI
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
中文大模型能力评测榜单:目前已囊括164个大模型,覆盖chatgpt、gpt-4o、谷歌gemini、Claude3.5、百度文心一言、千问、百川、讯飞星火、商汤senseChat、minimax等商用模型, 以及deepseek-v3、qwen2.5、llama3.3、phi-4、glm4、书生internLM2.5等开源大模型。不仅提供能力评分排行榜,也提供所有模型的原始输出结果!
Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
alibaba / Megatron-LLaMA
Forked from NVIDIA/Megatron-LMBest practice for training LLaMA models in Megatron-LM
A recipe for online RLHF and online iterative DPO.
Recipes to train reward model for RLHF.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Image forgery recognition algorithm
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Official inference repo for FLUX.1 models
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Research Code for Multimodal-Cognition Team in Ant Group