Stars
Official code for Attention-driven GUI Grounding, AAAI2025
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
Hierarchical topic segmentation of meeting transcripts using embeddings and divisive clustering.
SegEval Segmentation Evaluation Package
Multilingual Voice Understanding Model
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
SLT 2024 Challenge: Post-ASR-Speaker-Tagging
Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.
This repo contains required files for the INTERSPEECH 2022 Audio Deep Packet Loss Concealment (PLC) Challenge.
Cluster your data matrix with the Leiden algorithm.
Python wrappers for Kaldi Levenshtein's distance and alignment code.
MeetEval - A meeting transcription evaluation toolkit
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Whisper realtime streaming for long speech-to-text transcription and translation
Official repository of https://doi.org/10.1109/TASLP.2022.3167258. More up-to-date code is in "refactor" branch.
Chinese text normalization for speech processing
[ICASSP2023] Source code, model links and open test sets for paper SeACo-Paraformer.
open-source Mandarian biased word dataset
Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
MuCGEC中文纠错数据集及文本纠错SOTA模型开源;Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction"