Lists (32)
Sort Newest
Avatar
🗣️Spech
OCR
Graph
LLM Tools
LLM Apps
LLM Agents
🗃️LLM Memory
🧑🏫 Guide Book
🦍Large Models
📝NLP
🤖GPT
Diffusion
🗄Server
📖PaperCode
📶Router
🌏FQ
📚Study
Alfred
💻Mac
👨🏻💻CodingTools
🔭ScientificTools
🤖MachineLearning
📷CV
Zotero
📱 Phone
🛠MiscTools
AnomalyDetection
📋List
🖥Windows
🔬ResearchTool
🕸DeepLearning
Starred repositories
ChatTTS 2000条音色稳定性打分🥇+区分男女年龄👧+在线试听🔈 ChatTTS 2K Speaker Stability Score & Categorized by Gender and Age & Audio Preview
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
A generative speech model for daily dialogue.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Detect and extract tables to markdown and csv
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
an unofficial code for augment-XY-CUT in XYLayoutLM
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
Wiseflow is an agile information mining tool that extracts concise messages from various sources such as websites, WeChat official accounts, social platforms, etc. It automatically categorizes and …
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
OCR, layout analysis, reading order, table recognition in 90+ languages
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Empowering RAG with a memory-based data interface for all-purpose applications!
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
The home of the Jupyter notebook graph visualization widget powered by yFiles for HTML