Skip to content
View Jayson236's full-sized avatar

Block or report Jayson236

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Jupyter Notebook 21,805 2,303 Updated Mar 13, 2025

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 200 12 Updated Mar 29, 2025

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 4,820 523 Updated Apr 7, 2025

Text-to-Audio/Music Generation

Python 2,406 188 Updated Sep 29, 2024

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,708 75 Updated Aug 15, 2024

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,133 2,230 Updated Feb 1, 2025

✨✨Latest Advances on Multimodal Large Language Models

14,761 943 Updated Apr 15, 2025

A curated list of balanced multimodal learning methods.

60 3 Updated Apr 15, 2025

codebase and dataset for Aurelia

4 Updated Mar 29, 2025

这是一个从头训练大语言模型的项目,包括预训练、微调和直接偏好优化,模型拥有1B参数,支持中英文。

Python 358 48 Updated Feb 18, 2025

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

505 24 Updated Apr 9, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen2.5, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3…

Python 7,019 599 Updated Apr 17, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 2,586 191 Updated Apr 17, 2025
JavaScript 26 1 Updated Mar 29, 2025

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

464 9 Updated Apr 6, 2025

Fully open reproduction of DeepSeek-R1

Python 24,007 2,195 Updated Apr 18, 2025
Python 13 Updated Mar 21, 2025

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 444 23 Updated Apr 10, 2025

Solve Visual Understanding with Reinforced VLMs

Python 4,690 291 Updated Apr 18, 2025

Official implementation of paper "OED: Towards One-stage End-to-End Dynamic Scene Graph Generation".

Python 19 1 Updated Mar 26, 2024

A video database bridging human actions and human-object relationships

Python 142 18 Updated Jun 30, 2020

This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by google

Python 33 8 Updated Apr 4, 2025

Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"

Python 161 14 Updated Apr 17, 2025
Python 517 60 Updated Jan 2, 2025

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 1,144 77 Updated Jan 23, 2025

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 1,058 46 Updated Mar 18, 2025
Python 27 4 Updated Oct 10, 2024

《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。

Python 68,516 11,551 Updated Jul 30, 2024

2024年计算机保研预推免通知

738 45 Updated Nov 5, 2024
Next