Skip to content
/ UFO Public
forked from microsoft/UFO

A UI-Focused Agent for Windows OS Interaction.

License

Notifications You must be signed in to change notification settings

iinroi/UFO

Repository files navigation

UFO UFO Image: A UI-Focused Agent for Windows OS Interaction

Python VersionLicense: MITWelcome

UFO is a UI-Focused dual-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications.

🕌 Framework

UFO UFO Image operates as a dual-agent framework, encompassing:

  • AppAgent 🤖, tasked with choosing an application for fulfilling user requests. This agent may also switch to a different application when a request spans multiple applications, and the task is partially completed in the preceding application.
  • ActAgent 👾, responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application.

Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend the application UI and fulfill the user's request. For more details, please consult our technical report.

🆕 News

  • 📅 2024-02-08 UFO is released on GitHub🎈.

💥 Highlights

  • First Windows Agent - UFO represents the first agent framework that can translate user request in natural language into grounded operation on Windows OS.
  • Interactive Mode - UFO allows for multiple sub-requests from users in the same session for completing complex task.
  • Action Safeguard - UFO supports safeguard to prompt for user confirmation when the action is sensitive.
  • Easy Extension - UFO is easy to extend to accomplish more complex tasks with different operations.

✨ Getting Started

🛠️ Step 1: Installation

UFO requires Python >= 3.10 running on Windows OS >= 10. It can be installed by running the following command:

# [optional to create conda environment]
# conda create -n ufo python=3.10
# conda activate ufo

# clone the repository
git clone https://github.com/microsoft/UFO.git
cd UFO
# install the requirements
pip install -r requirements.txt

🖊️ Step 2: Configure the LLMs

Before running UFO, you need to provide your LLM configurations. Taking OpenAI as an example, you can configure ufo/config/config.yaml file as follows.

OpenAI

OPENAI_API_BASE: Your OpenAI Endpoint # The base URL for the OpenAI API
OPENAI_API_KEY: Your OpenAI Key  # Set the value to the openai key for the llm model
OPENAI_API_MODEL: GPT Model Name  # The only OpenAI model by now that accepts visual input

Azure OpenAI

OPENAI_API_BASE: Your OpenAI Endpoint # The base URL for the OpenAI API
OPENAI_API_KEY: Your OpenAI Key  # Set the value to the openai key for the llm model
OPENAI_API_MODEL: GPT Model Name  # The only OpenAI model by now that accepts visual input

🚩 Step 3: Start UFO

⌨️ Command Line (CLI)

# assume you are in the cloned UFO folder
python -m ufo --task <your_task_name>

This will start the UFO process and you can interact with it through the command line interface. If everything goes well, you will see the following message:

Welcome to use UFO🛸, A UI-focused Agent for Windows OS Interaction. 
 _   _  _____   ___
| | | ||  ___| / _ \
| | | || |_   | | | |
| |_| ||  _|  | |_| |
 \___/ |_|     \___/
Please enter your request to be completed🛸:

Reminder❗: Before UFO executing your request, please make sure the targeted applications are active on the system.

Step 4 🎥: Execution Logs

You can find the screenshots taken and request and reponse logs in the following folder:

./ufo/logs/<your_task_name>/

You may use them to debug, replay, or analyze the agent output.

❓Get help


🎬 Demo Examples

We present two demo videos that complete user request on Windows OS using UFO. For more cases, please consult our technical report.

1️⃣🗑️ Example 1: Deleting all notes on a PowerPoint presentation.

In this example, we will show you how to use UFO to deleting all notes on a PowerPoint presentation with just a few simple steps. Explore it to work smarter not harder!

ufo_delete_note.mp4

2️⃣📧 Example 2: Composing an email using text from multiple sources.

In this example, we will show you how to use UFO to extract texts from Word documents, description of an image, to compose an email and send. Enjoy your cross-application experiment with UFO!

ufo_meeting_note_crossed_app_demo_new.mp4

📊 Evaluation

To evaluate, please refer to the WindosBench in the Section A of Appendix in our technical report. Some tips for completing your request:

📚 Citation

Our paper could be found here. If you use UFO in your research, please cite our paper:

@article{ufo,
  title={UFO: A UI-Focused Agent for Windows OS Interaction},
  author={Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang},
  journal={arXiv preprint arXiv:2311.17541},
  year={2024}
}

Disclaimer

By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices in DISCLAIMER.md

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

About

A UI-Focused Agent for Windows OS Interaction.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%