AI
Robust Speech Recognition via Large-Scale Weak Supervision
A Stable Diffusion desktop frontend with inpainting, img2img and more!
Stable Diffusion web UI
A simple notebook demonstrating prompt-based music generation via Mubert API
Rembg is a tool to remove images background
Port of OpenAI's Whisper model in C/C++
OpenAI Whisper ASR Webservice API
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Your personal, fully customizable, Linux Voice Control Assistant.
Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The …
Easiest 1-click way to create beautiful artwork on your PC using AI, with no tech knowledge. Provides a browser UI for generating images from text prompts and images. Just enter your text prompt, a…
Stable diffusion for real-time music generation (web app)
Stable Diffusion built-in to Blender
A multi-voice TTS system trained with an emphasis on quality
Real-time face swap for PC streaming or video calls
The no-code platform for building custom LLM Agents
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Handwriting Synthesis with RNNs ✏️
Segment Anything in High Quality [NeurIPS 2023]
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Foundational Models for State-of-the-Art Speech and Text Translation
CoTracker is a model for tracking any point (pixel) on a video.
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Inference and training library for high-quality TTS models.