Stars
2
stars
written in Python
Clear filter
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".