mllm

Star

Here are 133 public repositories matching this topic...

microsoft / unilm

Star

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Updated Mar 4, 2025
Python

X-PLUG / MobileAgent

Star

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

android agent harmony ios app gui automation mobile copilot multimodal mobile-agents mllm multimodal-large-language-models gpt4v multimodal-agent

Updated Apr 1, 2025
Python

NExT-GPT / NExT-GPT

Star

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

multimodal gpt-4 foundation-models visual-language-learning large-language-models llm chatgpt instruction-tuning mllm multi-modal-chatgpt

Updated Nov 3, 2024
Python

ant-research / MagicQuill

Star

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

image-editing gradio aigc mllm

Updated Feb 27, 2025
Python

atfortes / Awesome-LLM-Reasoning

Star

Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓

awesome prompt gpt papers language-models strawberry reasoning cot multimodal in-context-learning prompt-engineering chain-of-thought chatgpt mllm deepseek gpt-4o openai-o1 deepseek-r1

Updated Mar 19, 2025

manycore-research / SpatialLM

Star

SpatialLM: Large Language Model for Spatial Understanding

point-clouds scene-understanding mllm spatial-intelligence

Updated Mar 28, 2025
Python

InternLM / InternLM-XComposer

Star

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated Jan 22, 2025
Python

X-PLUG / mPLUG-DocOwl

Star

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

multimodal table-understanding document-understanding mllm multimodal-large-language-models chart-understanding

Updated Dec 24, 2024
Python

simular-ai / Agent-S

Star

Agent S: an open agentic framework that uses computers like a human

memory planning ai-agents grounding computer-automation mllm retrieval-augmented-generation in-context-reinforcement-learning agent-computer-interface gui-agents computer-use

Updated Apr 9, 2025
Python

cambrian-mllm / cambrian

Star

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

computer-vision chatbot representation-learning clip dino large-language-models llms instruction-tuning mllm multimodal-large-language-models

Updated Oct 30, 2024
Python

SkyworkAI / Skywork-R1V

Star

Pioneering Multimodal Reasoning with CoT

llm mllm deepseek-r1

Updated Apr 9, 2025
Python

coderonion / awesome-yolo-object-detection

Star

🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

Updated Apr 9, 2025

magic-research / Sa2VA

Star

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

computer-vision mllm

Updated Apr 8, 2025
Python

BAAI-DCAI / Bunny

Star

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Nov 18, 2024
Python

CircleRadon / Osprey

Star

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

sam mllm visual-instruction-tuning pixel-understanding

Updated Feb 27, 2025
Python

coderonion / awesome-llm-and-aigc

Star

🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

Updated Apr 9, 2025

NVlabs / EAGLE

Star

Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

demo eagle llama lmm nvdia huggingface gpt4 large-language-models llm mllm llava lvlm llama3

Updated Apr 9, 2025
Python

VITA-MLLM / Woodpecker

Star

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

multimodality hallucination hallucinations large-language-models llm mllm multimodal-large-language-models

Updated Dec 23, 2024
Python

taco-group / OpenEMMA

Star

OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

machine-learning networking algorithms transportation artificial-intelligence perception autonomous-car autonomous-driving autonomous-vehicles emma autonomy generative-ai mllm open-emma large-lang

Updated Feb 19, 2025
Python

FoundationVision / Groma

Star

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

llama multimodal grounding foundation-models large-language-models llm mllm vision-language-model llama2

Updated Jun 7, 2024
Python

Improve this page

Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mllm

Here are 133 public repositories matching this topic...

microsoft / unilm

X-PLUG / MobileAgent

NExT-GPT / NExT-GPT

ant-research / MagicQuill

atfortes / Awesome-LLM-Reasoning

manycore-research / SpatialLM

InternLM / InternLM-XComposer

X-PLUG / mPLUG-DocOwl

simular-ai / Agent-S

cambrian-mllm / cambrian

SkyworkAI / Skywork-R1V

coderonion / awesome-yolo-object-detection

magic-research / Sa2VA

BAAI-DCAI / Bunny

CircleRadon / Osprey

coderonion / awesome-llm-and-aigc

NVlabs / EAGLE

VITA-MLLM / Woodpecker

taco-group / OpenEMMA

FoundationVision / Groma

Improve this page

Add this topic to your repo