This repository is a curated collection of research papers focused on the development, implementation, and evaluation of language models for audio data. Our goal is to provide researchers and practitioners with a comprehensive resource to explore the latest advancements in AudioLLMs. Contributions and suggestions for new papers are highly encouraged!
Date | Model | Key Affiliations | Paper | Link |
---|---|---|---|---|
2024-07 | FunAudioLLM | Alibaba | FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs | Paper / Code / Demo |
2024-05 | SpeechVerse | AWS | SpeechVerse: A Large-scale Generalizable Audio Language Model | Paper |
2024-04 | SALMONN | Tsinghua | SALMONN: Towards Generic Hearing Abilities for Large Language Models | Paper / Code / Demo |
2024-03 | WavLLM | CUHK | WavLLM: Towards Robust and Adaptive Speech Large Language Model | Paper / Code |
2024-01 | Pengi | Microsoft | Pengi: An Audio Language Model for Audio Tasks | Paper / Code |
2023-12 | Qwen-Audio | Alibaba | Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models | Paper / Code / Demo |
2023-12 | LTU-AS | MIT | Joint Audio and Speech Understanding | Paper / Code / Demo |
2023-10 | UniAudio | CUHK | An Audio Foundation Model Toward Universal Audio Generation | Paper / Code / Demo |
2023-09 | LLaSM | LinkSoul.AI | LLaSM: Large Language and Speech Model | Paper / Code |
2023-06 | AudioPaLM | AudioPaLM: A Large Language Model that Can Speak and Listen | Paper / Demo | |
2023-05 | VioLA | Microsoft | VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation | Paper |
2023-05 | SpeechGPT | Fudan | SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities | Paper / Code / Demo |
2023-04 | AudioGPT | Zhejiang Uni | AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head | Paper / Code |