Skip to content
View FDInSky's full-sized avatar

Block or report FDInSky

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

190 stars written in Python
Clear filter

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 25,251 3,221 Updated Sep 24, 2024

Python sample codes for robotics algorithms.

Python 24,057 6,639 Updated Jan 20, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 23,085 1,908 Updated Jan 20, 2025

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 17,033 1,217 Updated Jan 20, 2025

An open-source tool-augmented conversational language model from Fudan University

Python 12,019 1,147 Updated Jul 13, 2024

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,853 1,043 Updated Jan 20, 2025

An open source implementation of CLIP.

Python 10,828 1,022 Updated Jan 4, 2025

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 10,357 964 Updated Jan 20, 2025

A collaboration friendly studio for NeRFs

Python 9,794 1,345 Updated Jan 21, 2025

A collection of libraries to optimise AI model performances

Python 8,373 636 Updated Jul 22, 2024

Large World Model -- Modeling Text and Video with Millions Context

Python 7,209 554 Updated Oct 19, 2024

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Python 7,127 431 Updated Jan 9, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 6,834 524 Updated Dec 25, 2024

Enjoy the magic of Diffusion models!

Python 6,751 629 Updated Jan 15, 2025

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,802 377 Updated Mar 14, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,329 404 Updated Aug 7, 2024

Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek3, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Inter…

Python 5,100 444 Updated Jan 20, 2025

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Python 4,157 375 Updated Dec 6, 2024

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Python 3,438 197 Updated Jan 20, 2025

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,236 282 Updated May 4, 2024

GLM (General Language Model)

Python 3,220 326 Updated Nov 3, 2023

OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.

Python 3,199 388 Updated Jan 13, 2025

FFCV: Fast Forward Computer Vision (and other ML workloads!)

Python 2,882 179 Updated Jun 16, 2024

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 2,800 224 Updated Jan 11, 2025

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

Python 2,710 266 Updated Dec 21, 2024

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,127 88 Updated Aug 6, 2024

PyTorch pre-trained model for real-time interest point detection, description, and sparse tracking (https://arxiv.org/abs/1712.07629)

Python 1,969 397 Updated Jul 24, 2022

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 1,969 140 Updated Jan 17, 2025

VideoSys: An easy and efficient system for video generation

Python 1,884 128 Updated Jan 1, 2025

LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning

Python 1,754 64 Updated Jan 8, 2025
Next