SpeechOceanTech

SpeechOceanTech

Stars

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 14,419 1,487 Updated Dec 25, 2024

llSourcell / Doctor-Dignity

Doctor Dignity is an LLM that can pass the US Medical Licensing Exam. It works offline, it's cross-platform, & your health data stays private.

Python 3,862 413 Updated Sep 21, 2023

mindoc-org / mindoc

Golang实现的基于beego框架的接口在线文档管理系统

Go 7,471 1,937 Updated Dec 27, 2024

baaivision / Painter

Painter & SegGPT Series: Vision Foundation Models from BAAI

Python 2,555 177 Updated Dec 6, 2024

RUCAIBox / LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 11,139 871 Updated Aug 20, 2024

EleutherAI / gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Python 7,124 1,047 Updated Mar 8, 2025

mayooear / ai-pdf-chatbot-langchain

LangChain & LangGraph AI PDF chatbot agent

TypeScript 15,175 3,031 Updated Feb 20, 2025

haoheliu / voicefixer

General Speech Restoration

Python 1,090 134 Updated Feb 17, 2025

Open-Speech-EkStep / ULCA-asr-dataset-corpus

44 16 Updated Nov 23, 2022

modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 8,674 894 Updated Mar 7, 2025

coqui-ai / open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

1,316 142 Updated Jun 6, 2024

sdadas / polish-nlp-resources

Pre-trained models and language resources for Natural Language Processing in Polish

335 29 Updated Jun 5, 2024

ksopyla / awesome-nlp-polish

A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.

297 33 Updated Aug 8, 2021

srinivr / kaldi-long-audio-alignment

Long audio alignment using Kaldi

Shell 24 10 Updated Apr 22, 2021

Oguzhanercan / Vision-Transformers

Implementations of various Vision Transformer Models and Training Strategies

Jupyter Notebook 3 Updated Oct 22, 2022

VIPL-Audio-Visual-Speech-Understanding / deep-face-speechreading

Visual speech recognition with face inputs: code and models for F&G 2020 paper "Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition"

Python 17 5 Updated Apr 12, 2021

around-star / Speech-Recognition

Speech Recognition using Recurrent Neural Network Transducer

Jupyter Notebook 2 Updated Feb 13, 2021

tstafylakis / Lipreading-ResNet

Torch code for using Residual Networks with LSTMs for Lipreading

Lua 99 13 Updated Oct 8, 2018

facebookresearch / av_hubert

A self-supervised learning framework for audio-visual speech

Python 881 138 Updated Dec 7, 2023

mpc001 / Lipreading_using_Temporal_Convolutional_Networks

ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks

Python 408 104 Updated May 18, 2023

mpc001 / end-to-end-lipreading

Pytorch code for End-to-End Audiovisual Speech Recognition

Python 174 50 Updated Nov 18, 2022

VIPL-Audio-Visual-Speech-Understanding / LipNet-PyTorch

The state-of-art PyTorch implementation of the method described in the paper "LipNet: End-to-End Sentence-level Lipreading" (https://arxiv.org/abs/1611.01599)

Python 217 52 Updated Sep 21, 2022

mpc001 / Visual_Speech_Recognition_for_Multiple_Languages

Visual Speech Recognition for Multiple Languages

Python 388 60 Updated Aug 17, 2023

mli / autocut

用文本编辑器剪视频

Python 7,031 723 Updated Oct 5, 2024

tomaarsen / TTSTextNormalization

Convert English text from written expressions into spoken forms

Python 24 3 Updated Jun 22, 2022

Strange-AI / datasets

Collections of many datasets you may need and play with.

Shell 32 6 Updated Apr 9, 2019

cvqluu / nn-similarity-diarization

Neural network based similarity scoring for diarization (pytorch implementation of "LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization")

Python 44 12 Updated Oct 21, 2020

tencent-ailab / 3m-asr

3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition

Python 118 17 Updated Jun 22, 2022

yl4579 / StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Python 497 111 Updated Jan 13, 2025

bwang514 / PerformanceNet

PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network

Python 110 12 Updated Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly