Skip to content
/ VLog Public
forked from showlab/VLog

Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.

License

Notifications You must be signed in to change notification settings

yvonekit/VLog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

🎞 VLog: Video as a Long Document

vlog

News

  • 20/April/2023: We release our project on github and Huggingface!

To Do List

Done

  • Huggingface Space
  • LLM Reasoner: ChatGPT (multilingual) + LangChain
  • Vision Captioner: BLIP2 + GRIT
  • ASR Translator: Whisper (multilingual)
  • Video Segmenter: KTS

Doing

there are a lot of improvement space we are working on it

  • Improve Vision Models: MiniGPT-4, LLaVA, Family of Segment-anything
  • Replace ChatGPT with own trained LLM
  • Improve ASR Translator

🧸 Examples

🔨 Preparation

Please find installation instructions in install.md.

🌟 Start here

Run in cmd

python main.py --video_path "examples/demo.mp4"

The generated vlog is saved in examples/demo.log

Run in Gradio

python main_gradio.py

🙋 Suggestion

The project is stay tuned 🔥

If you have more suggestions or functions need to be implemented in this codebase, feel free to drop us an email kevin.qh.lin@gmail, [email protected] or open an issue.

😊 Acknowledgment

This work is based on ChatGPT, BLIP2, GRIT, KTS, Whisper, LangChain, Image2Paragraph.

See other wonderful Video + LLM projects: Ask-anything, Socratic Models, Vid2Seq, LaViLa.

About

Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%