Stars
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Doctor Dignity is an LLM that can pass the US Medical Licensing Exam. It works offline, it's cross-platform, & your health data stays private.
Painter & SegGPT Series: Vision Foundation Models from BAAI
The official GitHub page for the survey paper "A Survey of Large Language Models".
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
LangChain & LangGraph AI PDF chatbot agent
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Pre-trained models and language resources for Natural Language Processing in Polish
A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.
Long audio alignment using Kaldi
Implementations of various Vision Transformer Models and Training Strategies
Visual speech recognition with face inputs: code and models for F&G 2020 paper "Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition"
Speech Recognition using Recurrent Neural Network Transducer
Torch code for using Residual Networks with LSTMs for Lipreading
A self-supervised learning framework for audio-visual speech
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks
Pytorch code for End-to-End Audiovisual Speech Recognition
The state-of-art PyTorch implementation of the method described in the paper "LipNet: End-to-End Sentence-level Lipreading" (https://arxiv.org/abs/1611.01599)
Visual Speech Recognition for Multiple Languages
Convert English text from written expressions into spoken forms
Collections of many datasets you may need and play with.
Neural network based similarity scoring for diarization (pytorch implementation of "LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization")
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network