视频硬字幕提取，生成srt文件。无需申请第三方API，本地实现文本识别。基于深度学习的视频字幕提取框架，包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

Python 6,564 705 Updated Dec 24, 2024

microsoft / LLMLingua

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python 4,850 271 Updated Jan 26, 2025

tango4j / llm_speaker_tagging

SLT 2024 Challenge: Post-ASR-Speaker-Tagging

Python 14 1 Updated Jun 16, 2024

wq2012 / SpectralCluster

Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.

Python 523 72 Updated Sep 25, 2024

microsoft / PLC-Challenge

This repo contains required files for the INTERSPEECH 2022 Audio Deep Packet Loss Concealment (PLC) Challenge.

Python 81 11 Updated Oct 31, 2024

MiqG / leiden_clustering

Cluster your data matrix with the Leiden algorithm.

Python 4 2 Updated Aug 6, 2021

pzelasko / kaldialign

Python wrappers for Kaldi Levenshtein's distance and alignment code.

CMake 62 11 Updated Mar 17, 2024

fgnt / meeteval

MeetEval - A meeting transcription evaluation toolkit

Python 85 14 Updated Feb 6, 2025

CLUEbenchmark / CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

935 81 Updated Oct 17, 2022

ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Python 1,923 553 Updated Oct 27, 2023

netease-youdao / EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Python 7,639 650 Updated Aug 13, 2024

ufal / whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

Python 2,414 299 Updated Jan 7, 2025

SungFeng-Huang / Meta-TTS

Official repository of https://doi.org/10.1109/TASLP.2022.3167258. More up-to-date code is in "refactor" branch.

Python 188 37 Updated Jun 8, 2023

speechio / chinese_text_normalization

Chinese text normalization for speech processing

Python 646 146 Updated Mar 18, 2023

R1ckShi / SeACo-Paraformer

[ICASSP2023] Source code, model links and open test sets for paper SeACo-Paraformer.

28 1 Updated Mar 15, 2024

Liangzheng-ZL / BEdit-TTS

Speech samples and code of BEdit-TTS

Python 32 2 Updated Oct 8, 2023

thuhcsi / Contextual-Biasing-Dataset

open-source Mandarian biased word dataset

11 Updated Sep 21, 2023

speechio / BigCiDian

Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.

Python 255 55 Updated Oct 11, 2019

nl8590687 / ASRT_SpeechRecognition

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

Python 7,961 1,903 Updated Sep 26, 2024

HillZhang1999 / MuCGEC

MuCGEC中文纠错数据集及文本纠错SOTA模型开源；Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction"

Python 518 65 Updated Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

njzheng

Achievements

Achievements

Block or report njzheng

Stars

OpenBMB / VisRAG

HeimingX / TAG

patrickmineault / vid2slides

sakshishukla1996 / SpeechTopSeg

showlab / ShowUI

itsnamgyu / block-transformer

AugmendTech / treeseg

cfournie / segmentation.evaluation

tencent-ailab / FRA-RIR

FunAudioLLM / SenseVoice

YaoFANGUK / video-subtitle-extractor