Skip to content
View yuta0306's full-sized avatar

Organizations

@zhanglab-iu

Block or report yuta0306

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A real-time implementation of Voice Activity Projection (VAP) is aimed at controlling behaviors of spoken dialogue systems, such as turn-taking.

Python 47 6 Updated Jan 6, 2025

Minimalist ML framework for Rust

Rust 16,356 1,005 Updated Jan 22, 2025

A Rust implementation of OpenAI's Whisper model using the burn framework

Rust 285 36 Updated May 6, 2024

Voice Activity Projection Models: Self-supervised learning of Turn-taking Events

Python 45 13 Updated May 29, 2024

A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python

Python 17,234 2,397 Updated Jan 20, 2025

A wide variety of research projects developed by the SpokenNLP team of Speech Lab, Alibaba Group.

Python 112 11 Updated Dec 20, 2024

Build smaller, faster, and more secure desktop and mobile applications with a web frontend.

Rust 88,668 2,701 Updated Jan 23, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 12,943 2,641 Updated Jan 23, 2025

A self-supervised learning framework for audio-visual speech

Python 865 138 Updated Dec 7, 2023

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation

Python 853 209 Updated Mar 10, 2024

Convert PDF to markdown + JSON quickly with high accuracy

Python 19,492 1,163 Updated Jan 22, 2025

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Python 2,892 265 Updated Jun 4, 2024

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

601 45 Updated Jan 15, 2025

✨✨Latest Advances on Multimodal Large Language Models

13,629 872 Updated Jan 17, 2025

Code for fine-tuning Platypus fam LLMs using LoRA

Python 627 60 Updated Feb 4, 2024

日本語LLMまとめ - Overview of Japanese LLMs

TypeScript 1,076 31 Updated Jan 22, 2025

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Python 2,484 271 Updated Jan 12, 2025

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 25,277 3,224 Updated Sep 24, 2024

Easily create large video dataset from video urls

Python 561 67 Updated Jul 30, 2024

ROS bindings for OpenFace 2.1.0

C++ 7 7 Updated Aug 21, 2019

Faster Whisper transcription with CTranslate2

Python 13,622 1,147 Updated Jan 1, 2025

Implementation for our WACV 2021 paper "Multi-Loss Weighting with Coefficient of Variations"

Python 50 10 Updated Jan 11, 2021

A playbook for systematically maximizing the performance of deep learning models.

27,874 2,298 Updated Jun 18, 2024

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

Rust 31,473 2,055 Updated Jan 22, 2025

Materials for the Hugging Face Diffusion Models Course

Jupyter Notebook 3,826 415 Updated Aug 19, 2024

Materials for ACL-2022 tutorial: Knowledge-Augmented Methods for Natural Language Processing

288 24 Updated Aug 8, 2022

情報コース卒業研究報告予稿集用LaTeXスタイルファイル

TeX 1 3 Updated Jul 5, 2024

A python package to build AI-powered real-time audio applications

Python 1,146 90 Updated Jul 8, 2024
Next