A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 9,815 986 Updated Apr 14, 2025

jhtonyKoo / music_mixing_style_transfer

Python 165 18 Updated Oct 24, 2023

mt-upc / ZeroSwot

Pushing the Limits of Zero-shot End-to-End Speech Translation

Python 25 3 Updated Dec 12, 2024

markovka17 / dla

Deep learning for audio processing

Jupyter Notebook 640 111 Updated Dec 27, 2024

CandleLabAI / CoFormer-WACV-2024

Source code of "Textual Alchemy: CoFormer for Scene Text Understanding", published in WACV 2024

Python 1 1 Updated Dec 27, 2023

open-mmlab / PIA

[CVPR 2024] PIA, your Personalized Image Animator. Animate your images by text prompt, combing with Dreambooth, achieving stunning videos. PIA，你的个性化图像动画生成器，利用文本提示将图像变为奇妙的动画

Python 958 75 Updated Aug 5, 2024

williamyang1991 / Rerender_A_Video

[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

Jupyter Notebook 2,978 200 Updated Mar 9, 2024

prasunroy / stefann

🔥 [CVPR 2020] STEFANN: Scene Text Editor using Font Adaptive Neural Network (official code).

Python 270 41 Updated Apr 30, 2024

PRIS-CV / DemoFusion

Let us democratise high-resolution generation! (CVPR 2024)

Jupyter Notebook 2,011 226 Updated Apr 15, 2024

chongzhou96 / EdgeSAM

Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"

Jupyter Notebook 997 46 Updated Aug 12, 2024

UCSB-NLP-Chang / DiffSTE

Python 92 7 Updated Aug 1, 2024

qqqyd / MOSTEL

Python 56 4 Updated Jul 25, 2023

nihaomiao / CVPR23_LFDM

The pytorch implementation of our CVPR 2023 paper "Conditional Image-to-Video Generation with Latent Flow Diffusion Models"

Python 460 42 Updated Jun 18, 2024

NVlabs / genvs

639 10 Updated Apr 6, 2023

haofanwang / inswapper

One-click Face Swapper and Restoration powered by insightface 🔥

Python 605 99 Updated Apr 16, 2024

facefusion / facefusion

Industry leading face manipulation platform

Python 22,436 3,417 Updated Apr 16, 2025

Audio-AGI / WavJourney

WavJourney: Compositional Audio Creation with LLMs

Python 535 43 Updated Sep 28, 2023

JonathanFly / generative-disco

Forked from hellovivian/generative-disco

🎼 text-to-video system for music visualization

Python 1 Updated Jun 25, 2023

JonathanFly / audiocraft-webui

Forked from facebookresearch/audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python 1 Updated Jun 18, 2023

JonathanFly / faster-whisper-livestream-translator

faster-whisper livestream translation, OBS noise reduction, dual language subtitles

Python 78 7 Updated Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dima enotdima

Achievements

Achievements

Block or report enotdima

Stars

THUDM / CogVideo

niedev / RTranslator

RVC-Project / Retrieval-based-Voice-Conversion-WebUI

ototadana / sd-face-editor

clovaai / ClovaCall

kaiidams / Kokoro-Speech-Dataset

YuelangX / Gaussian-Head-Avatar

hpcaitech / Open-Sora

PKU-YuanGroup / Open-Sora-Plan

maum-ai / sane-tts

modelscope / FunASR