UFO is a UI-Focused dual-agent framework that seamlessly navigates and operates within individual applications and across them to fulfill user requests on Windows OS, even when spanning multiple applications.
- 📅 2024-02-30 UFO is released on GitHub🎈.
- First Windows Agent Framework - UFO represents the first agent framework that can translate user request in natural language into grounded operation on Windows OS.
- Interactive Mode - UFO allows for multiple sub-requests from users in the same session for completing complex task.
- Action Safeguard - UFO supports safeguard to prompt for user confirmation when the action is sensitive.
- Easy Extension - UFO is easy to extend to accomplish more complex tasks with different operations.
UFO requires Python >= 3.10 running on Windows OS >= 10. It can be installed by running the following command:
# [optional to create conda environment]
# conda create -n ufo python=3.10
# conda activate ufo
# clone the repository
git clone https://github.com/microsoft/UFO.git
cd UFO
# install the requirements
pip install -r requirements.txt
Before running UFO, you need to provide your LLM configurations. Taking OpenAI as an example, you can configure ufo/config/config.yaml
file as follows.
OPENAI_API_BASE: Your OpenAI Endpoint # The base URL for the OpenAI API
OPENAI_API_KEY: Your OpenAI Key # Set the value to sk-xxx if you host the openai interface for open llm model
OPENAI_API_MODEL: GPT Model Name # The only OpenAI model by now that accepts visual input
# assume you are in the cloned UFO folder
python -m ufo --task <your_task_name>
This will start the UFO process and you can interact with it through the command line interface. If everything goes well, you will see the following message:
Welcome to use UFO🛸, A UI-focused Multimodal Agent for Windows OS.
_ _ _____ ___
| | | || ___| / _ \
| | | || |_ | | | |
| |_| || _| | |_| |
\___/ |_| \___/
Please enter your request to be completed🛸:
<Reminder: Before inputing your request, please make sure the targeted applications are active on the system.>
You can find the screenshots taken and request and reponse logs in the following folder:
ufo/logs/<your_task_name>/
You may use them to debug, replay, or analyze the agent output.
- ❔GitHub Issues (prefered)
- For other communications, please contact [email protected]
We present two demos videos that complete user request on Windows OS using UFO.
In this example, we will show you how to use UFO to deleting all notes on a PowerPoint presentation with just a few simple steps. Explore it to work smarter not harder!
In this example, we will show you how to use UFO to extract texts from Word documents, description of an image, to compose an email and send. Enjoy your cross-application experiment with UFO!
Our paper could be found here. If you use UFO in your research, please cite our paper:
@article{ufo,
title={UFO: A UI-Focused Agent for Windows OS Interaction},
author={Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang},
journal={arXiv preprint arXiv:2311.17541},
year={2024}
}
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.