Lists (22)
Sort Name ascending (A-Z)
agent paper list
🐵 AIGC
APP-Agent
CFI
🔏differential-privacy
edge comput
File compression
📁FPGA
🔆IC
⚡ Inspiration
JAP
JSP
lecture
🎇LLM
LLM4OP
✅MCM
MLLM
🚀 My resp
offload
💬 Others
📙 research experience
RL
Starred repositories
LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects
VisionTasker introduces a novel two-stage framework combining vision-based UI understanding and LLM task planning for mobile task automation in a step-by-step manner.
🎬 ScreenToGif allows you to record a selected area of your screen, edit and save it as a gif or video.
Building a comprehensive and handy list of papers for GUI agents
Official style files for papers submitted to venues of the Association for Computational Linguistics
GitHub page for "Large Language Model-Brained GUI Agents: A Survey"
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
An annotated implementation of the Transformer paper.
Boost LaTeX typesetting efficiency with preview, compile, autocomplete, colorize, and more.
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
GPT-4V in Wonderland: LMMs as Smartphone Agents
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
UGround: Universal GUI Visual Grounding for GUI Agents
手把手带你实战 Huggingface Transformers 课程视频同步更新在B站与YouTube
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
A list of awesome papers and resources of recommender system on large language model (LLM).
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
An awesome & curated list of best LLMOps tools for developers
[Embodied-AI-Survey-2024] Paper list and projects for Embodied AI
🚀🚀 「大模型」3小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 3 hours!
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 20…
WONDERBREAD benchmark + dataset for BPM tasks