Skip to content
View Gaotianhong's full-sized avatar

Highlights

  • Pro

Block or report Gaotianhong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

155 results for source starred repositories
Clear filter

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 17,048 1,218 Updated Jan 20, 2025

Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek3, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Inter…

Python 5,100 444 Updated Jan 20, 2025

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

Python 1,696 242 Updated Jan 20, 2025

Curated list of datasets and tools for post-training.

2,523 218 Updated Jan 13, 2025

An Open Large Reasoning Model for Real-World Solutions

Python 1,392 72 Updated Nov 28, 2024

Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

Python 284 16 Updated Jan 17, 2025

A modular graph-based Retrieval-Augmented Generation (RAG) system

Python 21,789 2,150 Updated Jan 20, 2025

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

JavaScript 5,742 590 Updated Jan 8, 2025

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Python 127 4 Updated Dec 17, 2024

✨✨Latest Advances on Multimodal Large Language Models

13,596 869 Updated Jan 17, 2025

Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Python 207 13 Updated Jan 13, 2025

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 6,429 420 Updated Jan 12, 2025

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 7,605 592 Updated Jan 17, 2025
Python 51 Updated Dec 13, 2024

A Self-Training Framework for Vision-Language Reasoning

Python 60 1 Updated Nov 13, 2024
HTML 76 8 Updated May 10, 2024

ACL 2024: LoRA-Flow Dynamic LoRA Fusion for Large Language Models in Generative Tasks

Python 12 Updated Oct 9, 2024

WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge

Python 114 12 Updated Nov 11, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 2,577 208 Updated Dec 5, 2024

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Python 595 33 Updated Nov 26, 2024

精选机器学习,NLP,图像识别, 深度学习等人工智能领域学习资料,搜索,推荐,广告系统架构及算法技术资料整理。算法大牛笔记汇总

3,252 495 Updated Apr 15, 2024
Python 186 Updated Sep 11, 2024

致力于实习/校招/社招进大厂打法,计算机基础知识学习,C++、Java、算法学习路线,专注于编程打法!

1,265 77 Updated Aug 15, 2021

[NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models

Python 146 12 Updated Jan 1, 2025

CV Homework

Python 1 Updated Nov 22, 2023

MICCAI 2024 - Loose Lesion Location Self-supervision Enhanced Colorectal Cancer Diagnosis

Python 2 Updated Oct 10, 2024

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 2,218 150 Updated Sep 3, 2024

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.

Jupyter Notebook 7,120 468 Updated Nov 6, 2024

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Python 4,811 480 Updated Aug 6, 2024

数据挖掘、计算机视觉、自然语言处理、推荐系统竞赛知识、代码、思路

Jupyter Notebook 4,350 1,069 Updated Oct 8, 2024
Next