Stars
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
A python package to build AI-powered real-time audio applications
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Robust Speech Recognition via Large-Scale Weak Supervision
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
We write your reusable computer vision tools. 💜
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
【三年面试五年模拟】AI算法工程师面试秘籍。涵盖AIGC、传统深度学习、自动驾驶、机器学习、计算机视觉、自然语言处理、强化学习、具身智能、元宇宙、AGI等AI行业面试笔试经验与干货知识。
CVNets: A library for training computer vision networks
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
Official inference repo for FLUX.1 models
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience
High-Resolution Image Synthesis with Latent Diffusion Models
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
Open-Sora: Democratizing Efficient Video Production for All
Unsupervised text tokenizer for Neural Network-based text generation.
Pytorch implementation of Diffusion Models (https://arxiv.org/pdf/2006.11239.pdf)
[CSUR] A Survey on Video Diffusion Models
⛽️「算法通关手册」:超详细的「算法与数据结构」基础讲解教程,从零基础开始学习算法知识,850+ 道「LeetCode 题目」详细解析,200 道「大厂面试热门题目」。