Stars
🚀 「大模型」3小时从0训练27M参数的视觉多模态VLM!🌏 Train a 27M-parameter VLM from scratch in just 3 hours!
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker/Zotero
🚀🚀 「大模型」50分钟完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 50 min!
🎉 Elegant and powerful theme for Hexo.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
批量为视频或者音频生成字幕,并可批量将字幕翻译成其它语言。这是一个客户端工具, 跨平台支持 mac 和 windows 系统, 支持百度,火山,deeplx, openai, deepseek, ollama 等多个翻译服务
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
Sealos is a production-ready Kubernetes distribution. You can create any programming language and any framework development Env, create high availability databases like mysql/pgsql/redis/mongo, and…
本项目是一个用于翻译数据集的工具,支持通过命令行脚本调用进行数据集多语言翻译。
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
OCR, layout analysis, reading order, table recognition in 90+ languages
Fast and memory-efficient exact attention
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
LiberSonora,寓意“自由的声音”,是一个 AI 赋能的、强大的、开源有声书工具集,包含智能字幕提取、AI标题生成、多语言翻译等功能,支持 GPU 加速、批量离线处理。LiberSonora, meaning "The Voice of Freedom," is an AI-powered robust open-source audiobook toolkit.
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
A list of Free Software network services and web applications which can be hosted on your own servers
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
Vim-fork focused on extensibility and usability
Based on RapidOCR, extract the PDF content.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
一些大语言模型和多模态模型的应用,主要包括Rag,小模型,Agent,跨模态搜索,OCR等等
Analysis of Chinese and English layouts 中英文版面分析
📣 商用级开源语音自动识别程序库,开箱即用,全平台支持,中英文混合识别。A Cross-platform implementation of ASR inference. It's based on ONNXRuntime and FunASR. We provide a set of easier APIs to call ASR models.
检测和提取各种场景图片中的表格区域,并纠正透视和旋转问题 Detect and extract table regions from images in various scenarios, and correct perspective and rotation issues.
整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX Organize the currently open-source optimal table recognition models, improve pre-processing and post-processing, and convert the models to ONNX.