Stars
Triton implementation of FlashAttention2 that adds Custom Masks.
Conan - The open-source C and C++ package manager
Development repository for the Triton language and compiler
Hackable and optimized Transformers building blocks, supporting a composable construction.
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Fully open reproduction of DeepSeek-R1
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
A high-throughput and memory-efficient inference and serving engine for LLMs
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
A parallel implementation of gzip for modern multi-processor, multi-core machines.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Large, modern dataset for speech recognition
This package aims at simplifying the download of the AudioSet dataset.
Inference code for PaSST, using the HEAR API.
Efficient Training of Audio Transformers with Patchout
Scenic: A Jax Library for Computer Vision Research and Beyond
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
khanhi2r / ast
Forked from YuanGongND/astCode for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
A free audio dataset of spoken digits. An audio version of MNIST.
Torch implementation of ViT based classifier for Audio classification