GitHub - farewellthree/PPLLaVA: Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

News 📢

[2024/11/4] We have added the gradio chatbox, see the instruction.
[2024/10/28] All codes and weights are available now! Welcome to watch this repository for the latest updates.

Introduction 💡

PPLLaVA is an effective and efficient video large language model. Our model incorporates three parts:
- (1) Fine-grained vision-prompt alignment.
- (2) Visual token compression by user instruction with convolution-style pooling.
- (3) CLIP context extension.

PPLLaVA has established new state-of-the-art results on VideoMME, MVBench, VideoChatGPT Bench, and VideoQA Bench, using only 1024 visual tokens and achieving a throughput 8x faster.

Method	Image Pretrain	LLM	VideoMME	VCGBench	MVBench	ActivityNetQA
VideoLLaMA	BLIP-2	Vicuna-7B	-	1.96	34.1	12.4
LLaMA-Adapter	-	Vicuna-7B	-	2.03	31.7	34.2
VideoChat	BLIP-2	Vicuna-7B	-	2.23	35.5	26.5
VideoChatGPT	LLaVA-1.0	Vicuna-7B	-	2.38	32.7	35.2
BT-Adapter	LLaVA-1.0	Vicuna-7B	-	2.69	-	45.7
LLaMA-VID	InstructBLIP	Vicuna-13B	-	2.89	-	47.4
VideoChat2	-	Vicuna-7B	-	2.98	51.1	49.1
Chat-UniVi	LLaVA-1.5	Vicuna-7B	45.9	2.99	-	47.2
STLLM	InstructBLIP	Vicuna-7B	42.3	3.15	-	50.9
PLLaVA	LLaVA-Next	Vicuna-7B	-	3.12	46.6	56.3
VLM-RLAIF	LLaVA-1.5	Vicuna-7B	-	3.49	-	57.3
LLaVA-Next-Video	LLaVA-Next	Vicuna-7B	45.0	3.66	-	60.2
PPLLaVA	LLaVA-Next	Vicuna-7B	53.6	3.73	59.2	60.7

Demo 🤗

Please download the conversation weights from here and follow the instructions in installation first. Then, run the gradio demo:

CUDA_VISIBLE_DEVICES=0 python3 demo.py --ckpt-path /path/to/PPLLaVA_conversation_weight

Examples 👀

Video Dense Caption: PPLLaVA can effectively balance the content, state, and motion of both the foreground and background, while maintaining detail and accuracy.

Multi-turn dialogue and reasoning: PPLLaVA can engage in smooth Q&A interactions and provide reasonable inferences.

Installation 🛠️

Git clone our repository, create a Python environment and activate it via the following command

git clone https://github.com/farewellthree/PPLLaVA.git
cd PPLLaVA
conda create --name ppllava python=3.9
conda activate ppllava
pip install -r requirement.txt

Training & Validation 📊

The instructions of data, training and evaluating can be found in trainval.md.

Citation ✏️

If you find the code and paper useful for your research, please consider staring this repo and citing our paper:

@inproceedings{liu2025st,
  title={St-llm: Large language models are effective temporal learners},
  author={Liu, Ruyang and Li, Chen and Tang, Haoran and Ge, Yixiao and Shan, Ying and Li, Ge},
  booktitle={European Conference on Computer Vision},
  pages={1--18},
  year={2025},
  organization={Springer}
}

@article{liu2024ppllava,
  title={PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance},
  author={Liu, Ruyang and Tang, Haoran and Liu, Haibo and Ge, Yixiao and Shan, Ying and Li, Chen and Yang, Jiankun},
  journal={arXiv preprint arXiv:2411.02327},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
config		config
example		example
ppllava		ppllava
script		script
.DS_Store		.DS_Store
LICENSE		LICENSE
PrepareVicuna.md		PrepareVicuna.md
README.md		README.md
demo.py		demo.py
environment.yml		environment.yml
requirement.txt		requirement.txt
trainval.md		trainval.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

News 📢

Introduction 💡

Demo 🤗

Examples 👀

Installation 🛠️

Training & Validation 📊

Citation ✏️

About

Releases

Packages

Languages

License

farewellthree/PPLLaVA

Folders and files

Latest commit

History

Repository files navigation

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

News 📢

Introduction 💡

Demo 🤗

Examples 👀

Installation 🛠️

Training & Validation 📊

Citation ✏️

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages