Stars
本项目为 chatgpt-on-wechat下游分支, 额外对接了LLMOps平台 Dify,支持Dify智能助手模式,调用工具和知识库,支持Dify工作流。
A list of VLMs tailored for medical RG and VQA; and a list of medical vision-language datasets
Official implementation for MedCLIP-SAM (MICCAI 2024)
A Survey on CLIP in Medical Imaging
EMNLP'22 | MedCLIP: Contrastive Learning from Unpaired Medical Images and Texts
Fine-tuning CLIP using ROCO dataset which contains image-caption pairs from PubMed articles.
[ICANN 2024 (Oral)] MISS: A Generative Pre-training and Fine-tuning Approach for Med-VQA
The official codes for "PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents"
PMC-VQA is a large-scale medical visual question-answering dataset, which contains 227k VQA pairs of 149k images that cover various modalities or diseases.
BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
An easy way to apply LoRA to CLIP. Implementation of the paper "Low-Rank Few-Shot Adaptation of Vision-Language Models" (CLIP-LoRA) [CVPRW 2024].
Image to prompt with BLIP and CLIP
Official repository of the paper "InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection"
Official implementation for MedCLIP-SAMv2
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
An open source implementation of CLIP.
✂️ Modern copy to clipboard. No Flash. Just 3kb gzipped 📋
Candidate-Heuristic In-Context Learning: A New Framework for Enhancing MedVQA with Large Language Models
The Largest-scale Chinese Medical QA Dataset: with 26,000,000 question answer pairs.
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
An open-source framework for training large multimodal models.
This is the official repository for the LENS (Large Language Models Enhanced to See) system.
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
LAVIS - A One-stop Library for Language-Vision Intelligence