Starred repositories
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
[IEEE T-PAMI 2024] All you need for End-to-end Autonomous Driving
[ICCV 2023] VAD: Vectorized Scene Representation for Efficient Autonomous Driving
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving
[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
Deformable DETR: Deformable Transformers for End-to-End Object Detection.
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
讯飞星火大模型Java SDK 易开发,更灵活. Xun fei SparkDesk Java SDK. SparkDesk. xfyun SDK. xinghuo.星火.
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
The official Python client for the Huggingface Hub.
Code examples and resources for DBRX, a large language model developed by Databricks
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
基于标贝数据继续训练,同时对原本的FastSpeech2模型做了改进,引入了韵律表征以及韵律预测模块,使中文发音更生动且富有节奏
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
YouTube Shorts videos generator using ChatGPT, Bing, GTTS, OpenAI's Whisper, MoviePy and Python
Python library and CLI tool to interface with Google Translate's text-to-speech API
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Offline Text To Speech synthesis for python
Speech to text (PocketSphinx, Iflytex API, Baidu API) and text to speech (pyttsx3) | 语音转文字(PocketSphinx、百度 API、科大讯飞 API)和文字转语音(pyttsx3)
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
A cross-platform framework using Vue.js
DeepSeek Coder: Let the Code Write Itself