Building on the foundations of language modeling in natural language processing, Next Token Prediction (NTP) has evolved into a versatile training objective for machine learning tasks across various modalities, achieving considerable success in both understanding and generation tasks. This repo features a comprehensive paper and repos collection for the survey: "Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey".
Authors: Liang Chen1, Zekun Wang2, Shuhuai Ren1, Lei Li3, Haozhe Zhao1, Yunshui Li 4, Zefan Cai1, Hongcheng Guo2, Lei Zhang4, Yizhe Xiong5, Yichi Zhang1, Ruoyu Wu1, Qingxiu Dong1, Ge Zhang6, Jian Yang8, Lingwei Meng7, Shujie Hu7, Yulong Chen9, Junyang Lin8, Shuai Bai8, Andreas Vlachos9, Xu Tan 10, Minjia Zhang11, Wen Xiao 10, Aaron Yee12,13, Tianyu Liu8, Baobao Chang1
1Peking University 2Beihang University 3University of Hong Kong 4Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences 5Tsinghua University 6M-A-P 7The Chinese University of Hong Kong 8Alibaba Group 9University of Cambridge 10Microsoft Research 11UIUC 12Humanify Inc. 13Zhejiang University
- 2024.12.30: We release the survey on arxiv and this repo at GitHub! Feel free to make pull requests to add the latest work to the seasonly update of the survey ~
- Awesome Multimodal Tokenizers
- Awesome MMNTP Models
- Awesome Multimodal Prompt Engineering
- 3.1 Multimodal ICL
- 3.2 Multimodal CoT
- Citation
If you feel our work helpful, please kindly cite the paper :)
@misc{chen2024tokenpredictionmultimodalintelligence,
title={Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey},
author={Liang Chen and Zekun Wang and Shuhuai Ren and Lei Li and Haozhe Zhao and Yunshui Li and Zefan Cai and Hongcheng Guo and Lei Zhang and Yizhe Xiong and Yichi Zhang and Ruoyu Wu and Qingxiu Dong and Ge Zhang and Jian Yang and Lingwei Meng and Shujie Hu and Yulong Chen and Junyang Lin and Shuai Bai and Andreas Vlachos and Xu Tan and Minjia Zhang and Wen Xiao and Aaron Yee and Tianyu Liu and Baobao Chang},
year={2024},
eprint={2412.18619},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.18619},
}