Skip to content
View njzheng's full-sized avatar

Block or report njzheng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Parsing-free RAG supported by VLMs

Python 577 46 Updated Jan 20, 2025

Official code for Attention-driven GUI Grounding, AAAI2025

Python 4 Updated Dec 17, 2024

Extract slides from video

Python 24 5 Updated Nov 23, 2020
Python 2 Updated Aug 23, 2024

Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

Jupyter Notebook 920 52 Updated Feb 8, 2025

Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

Python 149 7 Updated Dec 18, 2024

Hierarchical topic segmentation of meeting transcripts using embeddings and divisive clustering.

Python 51 3 Updated Jul 30, 2024

SegEval Segmentation Evaluation Package

Python 55 13 Updated Jun 13, 2023
Python 185 27 Updated Dec 4, 2023

Multilingual Voice Understanding Model

Python 4,303 376 Updated Jan 8, 2025

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

Python 6,564 705 Updated Dec 24, 2024

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python 4,850 271 Updated Jan 26, 2025

SLT 2024 Challenge: Post-ASR-Speaker-Tagging

Python 14 1 Updated Jun 16, 2024

Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.

Python 523 72 Updated Sep 25, 2024

This repo contains required files for the INTERSPEECH 2022 Audio Deep Packet Loss Concealment (PLC) Challenge.

Python 81 11 Updated Oct 31, 2024

Cluster your data matrix with the Leiden algorithm.

Python 4 2 Updated Aug 6, 2021

Python wrappers for Kaldi Levenshtein's distance and alignment code.

CMake 62 11 Updated Mar 17, 2024

MeetEval - A meeting transcription evaluation toolkit

Python 85 14 Updated Feb 6, 2025

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

935 81 Updated Oct 17, 2022

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Python 1,923 553 Updated Oct 27, 2023

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Python 7,639 650 Updated Aug 13, 2024

Whisper realtime streaming for long speech-to-text transcription and translation

Python 2,414 299 Updated Jan 7, 2025

Official repository of https://doi.org/10.1109/TASLP.2022.3167258. More up-to-date code is in "refactor" branch.

Python 188 37 Updated Jun 8, 2023

Chinese text normalization for speech processing

Python 646 146 Updated Mar 18, 2023

[ICASSP2023] Source code, model links and open test sets for paper SeACo-Paraformer.

28 1 Updated Mar 15, 2024

Speech samples and code of BEdit-TTS

Python 32 2 Updated Oct 8, 2023

open-source Mandarian biased word dataset

11 Updated Sep 21, 2023

Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.

Python 255 55 Updated Oct 11, 2019

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

Python 7,961 1,903 Updated Sep 26, 2024

MuCGEC中文纠错数据集及文本纠错SOTA模型开源;Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction"

Python 518 65 Updated Jun 9, 2023
Next