Skip to content
View SpeechOceanTech's full-sized avatar

Block or report SpeechOceanTech

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 14,419 1,487 Updated Dec 25, 2024

Doctor Dignity is an LLM that can pass the US Medical Licensing Exam. It works offline, it's cross-platform, & your health data stays private.

Python 3,862 413 Updated Sep 21, 2023

Golang实现的基于beego框架的接口在线文档管理系统

Go 7,471 1,937 Updated Dec 27, 2024

Painter & SegGPT Series: Vision Foundation Models from BAAI

Python 2,555 177 Updated Dec 6, 2024

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 11,139 871 Updated Aug 20, 2024

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Python 7,124 1,047 Updated Mar 8, 2025

LangChain & LangGraph AI PDF chatbot agent

TypeScript 15,175 3,031 Updated Feb 20, 2025

General Speech Restoration

Python 1,090 134 Updated Feb 17, 2025

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 8,674 894 Updated Mar 7, 2025

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

1,316 142 Updated Jun 6, 2024

Pre-trained models and language resources for Natural Language Processing in Polish

335 29 Updated Jun 5, 2024

A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.

297 33 Updated Aug 8, 2021

Long audio alignment using Kaldi

Shell 24 10 Updated Apr 22, 2021

Implementations of various Vision Transformer Models and Training Strategies

Jupyter Notebook 3 Updated Oct 22, 2022

Visual speech recognition with face inputs: code and models for F&G 2020 paper "Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition"

Python 17 5 Updated Apr 12, 2021

Speech Recognition using Recurrent Neural Network Transducer

Jupyter Notebook 2 Updated Feb 13, 2021

Torch code for using Residual Networks with LSTMs for Lipreading

Lua 99 13 Updated Oct 8, 2018

A self-supervised learning framework for audio-visual speech

Python 881 138 Updated Dec 7, 2023

ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks

Python 408 104 Updated May 18, 2023

Pytorch code for End-to-End Audiovisual Speech Recognition

Python 174 50 Updated Nov 18, 2022

The state-of-art PyTorch implementation of the method described in the paper "LipNet: End-to-End Sentence-level Lipreading" (https://arxiv.org/abs/1611.01599)

Python 217 52 Updated Sep 21, 2022

Visual Speech Recognition for Multiple Languages

Python 388 60 Updated Aug 17, 2023

用文本编辑器剪视频

Python 7,031 723 Updated Oct 5, 2024

Convert English text from written expressions into spoken forms

Python 24 3 Updated Jun 22, 2022

Collections of many datasets you may need and play with.

Shell 32 6 Updated Apr 9, 2019

Neural network based similarity scoring for diarization (pytorch implementation of "LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization")

Python 44 12 Updated Oct 21, 2020

3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition

Python 118 17 Updated Jun 22, 2022

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Python 497 111 Updated Jan 13, 2025

PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network

Python 110 12 Updated Jul 6, 2023
Next