
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Allow LLMs to control a browser with Browserbase and Stagehand
MegaDetector is an AI model that helps conservation folks spend less time doing boring things with camera trap images.
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.
A simple screen parsing tool towards pure vision based GUI agent
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
Automate the process of making money online.
Make websites accessible for AI agents
Python tool for converting files and office documents to Markdown.
Generic automation framework for acceptance testing and RPA
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
超级速查表 - 编程语言、框架和开发工具的速查表,单个文件包含一切你需要知道的东西 ⚡
ML-powered speech recognition directly in your browser
All Algorithms implemented in Python
⚡ Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 500+ plugins. Alternative to Zapier, Rundeck, Camunda, Airflow...
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)