Stars
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Easily train a good VC model with voice data <= 10 mins!
Writing AI Conference Papers: A Handbook for Beginners
Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation
A blog for understanding graph neural network
Multi-task learning using uncertainty to weigh losses for scene geometry and semantics, Auxiliary Tasks in Multi-task Learning
[ICCV2023] "Vision HGNN: An Image is More than a Graph of Nodes" by Yan Han, Peihao Wang, Souvik Kundu, Ying Ding, and Zhangyang Wang
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
healthcare data standard in China
https://survivesjtu.github.io/SJTU-Application/#/
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
👀 Eye Tracking library easily implementable to your projects
State-of-the-Art Text Embeddings
An open source implementation of CLIP.
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
On explainable attention-based deep neural networks trained on radiographic data augmented with diffusion models
Is synthetic data from generative models ready for image recognition?
LeetCode Solutions: A Record of My Problem Solving Journey.( leetcode题解,记录自己的leetcode解题之路。)
Demonstrate all the questions on LeetCode in the form of animation.(用动画的形式呈现解LeetCode题目的思路)
Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also …
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…