Stars
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
✨✨Latest Advances on Multimodal Large Language Models
AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目
This repository compiles a list of papers related to the application of video technology in the field of robotics! Star⭐ the repo and follow me if you like what you see🤩.
A large-scale benchmark and learning environment.
ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
Simple and easily configurable grid world environments for reinforcement learning
[ICML 2024] Official code repository for 3D embodied generalist agent LEO
Code for the Paper M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models.
AI2-THOR Data Collection Tool Based On Keyboard Interaction
[NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling better-reasoned decision-making for daily task planning problems.
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
A generative world for general-purpose robotics & embodied AI learning.
Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
Recent LLM-based CV and related works. Welcome to comment/contribute!
The calflops is designed to calculate FLOPs、MACs and Parameters in all various neural networks, such as Linear、 CNN、 RNN、 GCN、Transformer(Bert、LlaMA etc Large Language Model)
Reference implementations of several LangChain agents as Streamlit apps
[CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.