Stars
Code for paper "Network Bending of Diffusion Models for Audio-Visual Generation" at DAFx 2024
Reference-aware automatic speech evaluation toolkit
A Representation Evaluation Framework for Music Information Retrieval tasks
Solve puzzles. Improve your pytorch.
Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986
Source code for "Modulation Extraction for LFO-driven Audio Effects".
Seamless analysis of your PyTorch models (RAM usage, FLOPs, MACs, receptive field, etc.)
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
autocorrelation-based O(NlogN) pitch detection
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM
Repository for training models for music source separation.
Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs
Collection of audio-focused loss functions in PyTorch
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Self-supervised learning for fast pitch estimation
An example plugin using RTNeural with a SIMD architecture determined at run-time
2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.
Generate new latent codes for RAVE with Denoising Diffusion models.
[CVPR 2024 Highlight] PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Differentiable audio signal processors in PyTorch