Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Semantic Propositional Image Caption Evaluation
Learning to Evaluate Image Captioning. CVPR 2018
A curated list of awesome open source workflow engines
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
A list of awesome papers and resources of recommender system on large language model (LLM).
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
A curated list of recent and past chart understanding work based on our survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.
List of projects for 3d reconstruction
The fast, Pythonic way to build Model Context Protocol servers 🚀
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube dow…
REBEL is a seq2seq model that simplifies Relation Extraction (EMNLP 2021).
proxychains - a tool that forces any TCP connection made by any given application to follow through proxy like TOR or any other SOCKS4, SOCKS5 or HTTP(S) proxy. Supported auth-types: "user/pass" fo…
A curated list of awesome LLM for Autonomous Driving resources (continually updated)
[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
Automate browser-based workflows with LLMs and Computer Vision
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Audio Large Language Models
Industry leading face manipulation platform