Stars
A real-time implementation of Voice Activity Projection (VAP) is aimed at controlling behaviors of spoken dialogue systems, such as turn-taking.
A Rust implementation of OpenAI's Whisper model using the burn framework
Voice Activity Projection Models: Self-supervised learning of Turn-taking Events
A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
A wide variety of research projects developed by the SpokenNLP team of Speech Lab, Alibaba Group.
Build smaller, faster, and more secure desktop and mobile applications with a web frontend.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A self-supervised learning framework for audio-visual speech
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation
Convert PDF to markdown + JSON quickly with high accuracy
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
✨✨Latest Advances on Multimodal Large Language Models
Code for fine-tuning Platypus fam LLMs using LoRA
日本語LLMまとめ - Overview of Japanese LLMs
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Easily create large video dataset from video urls
ditoec / openface2_ros
Forked from interaction-lab/openface_rosROS bindings for OpenFace 2.1.0
Faster Whisper transcription with CTranslate2
Implementation for our WACV 2021 paper "Multi-Loss Weighting with Coefficient of Variations"
A playbook for systematically maximizing the performance of deep learning models.
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
Materials for the Hugging Face Diffusion Models Course
Materials for ACL-2022 tutorial: Knowledge-Augmented Methods for Natural Language Processing
A python package to build AI-powered real-time audio applications