Stars
Modeling, training, eval, and inference code for OLMo
A Conversational Speech Generation Model
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
SciGraphQA: Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
A community dedicated to supporting tools for technical and scientific communication and interactive computing
Embedded driver for the SCD4x sensor family.
Sensirion SCD4x sensor library for the ESP32 microcontroller family. It enables developers to communicate with the SCD4x sensor on the ESP32 platform using the I2C communication channel.
Text and image to video generation: Kandinsky 4.0 (2024)
A mini-framework for evaluating LLM performance on the Bulls and Cows number guessing game, supporting multiple LLM providers.
LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances human-computer interaction through real-time spoken dialogue…
Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?
MOS score prediction by fine-tuned wav2vec2.0 model
Inference and training library for high-quality TTS models.
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.
The python library and service for automatic speech recognition and transcribing in Russian and English
Deep Learning Audio Course, 2024
Algorithms and Data Structures course at ITMO University
PyTorch implementation for DDPM & DDIM