Jayson236

Jayson Jayson236

Stars

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Jupyter Notebook 21,805 2,303 Updated Mar 13, 2025

zhenye234 / xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 200 12 Updated Mar 29, 2025

multimodal-art-projection / YuE

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 4,820 523 Updated Apr 7, 2025

haoheliu / AudioLDM2

Text-to-Audio/Music Generation

Python 2,406 188 Updated Sep 29, 2024

FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,708 75 Updated Aug 15, 2024

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,133 2,230 Updated Feb 1, 2025

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

14,761 943 Updated Apr 15, 2025

GeWu-Lab / awesome-balanced-multimodal-learning

A curated list of balanced multimodal learning methods.

60 3 Updated Apr 15, 2025

schowdhury671 / aurelia

codebase and dataset for Aurelia

4 Updated Mar 29, 2025

qiufengqijun / mini_qwen

这是一个从头训练大语言模型的项目，包括预训练、微调和直接偏好优化，模型拥有1B参数，支持中英文。

Python 358 48 Updated Feb 18, 2025

showlab / Awesome-Unified-Multimodal-Models

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

505 24 Updated Apr 9, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen2.5, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3…

Python 7,019 599 Updated Apr 17, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 2,586 191 Updated Apr 17, 2025