Skip to content

Commit

Permalink
init commit
Browse files Browse the repository at this point in the history
  • Loading branch information
vyokky committed Feb 6, 2024
1 parent 7e08311 commit bb935bc
Show file tree
Hide file tree
Showing 26 changed files with 2,012 additions and 432 deletions.
402 changes: 6 additions & 396 deletions .gitignore

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Contributing

This project welcomes contributions and suggestions. Most contributions require you to
agree to a Contributor License Agreement (CLA) declaring that you have the right to,
and actually do, grant us the rights to use your contribution. For details, visit
https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need
to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the
instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
or contact [[email protected]](mailto:[email protected]) with any additional questions or comments.
35 changes: 35 additions & 0 deletions DISCLAIMER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Disclaimer: Code Execution and Data Handling Notice

By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices:

## 1. Code Functionality:
The code you are about to execute has the capability to capture screenshots of your working desktop environment and active applications. These screenshots will be processed and sent to the GPT model for inference.

## 2. Data Transmission:
Upon execution, the captured screenshots will be transmitted to external servers hosting the GPT model. This transmission is necessary for the inference process to generate relevant outputs based on the visual information provided.

## 3. Data Privacy and Storage:
It is crucial to note that Microsoft, the provider of this code, explicitly states that it does not collect or save any of the transmitted data. The captured screenshots are processed in real-time for the purpose of inference, and no permanent storage or record of this data is retained by Microsoft.

## 4. User Responsibility:
By running the code, you understand and accept the responsibility for the content and nature of the data present on your desktop during the execution period. It is your responsibility to ensure that no sensitive or confidential information is visible or captured during this process.

## 5. Security Measures:
Microsoft has implemented security measures to safeguard the data transmission process. However, it is recommended that you run the code in a secure and controlled environment to minimize potential risks. Ensure that you are running the latest security updates on your system.

## 6. Consent for Inference:
You explicitly provide consent for the GPT model to analyze the captured screenshots for the purpose of generating relevant outputs. This consent is inherent in the act of executing the code.

## 7. No Guarantee of Accuracy:
The outputs generated by the GPT model are based on patterns learned during training and may not always be accurate or contextually relevant. Microsoft does not guarantee the accuracy or suitability of the inferences made by the model.

## 8. Indemnification:
Users agree to defend, indemnify, and hold Microsoft harmless from and against all damages, costs, and attorneys' fees in connection with any claims arising from the use of this Repo.

## 9. Reporting Infringements:
If anyone believes that this Repo infringes on their rights, please notify the project owner via the provided project owner email. Microsoft will investigate and take appropriate actions as necessary.

## 10. Modifications to the Disclaimer:
Microsoft reserves the right to update or modify this disclaimer at any time without prior notice. It is your responsibility to review the disclaimer periodically for any changes.

By proceeding to execute the code, you acknowledge that you have read, understood, and agreed to the terms outlined in this disclaimer. If you do not agree with these terms, refrain from running the provided code.
34 changes: 17 additions & 17 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
MIT License
Copyright (c) Microsoft Corporation.

Copyright (c) Microsoft Corporation.
MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE
THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
135 changes: 116 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,130 @@
# Project
<!-- <h1 align="center">
UFO<img src="./assets/ufo.png" width="40px"/> :A <strong>U</strong>I-<strong>F</strong>ocused Multimodal Agent for Windows <strong>O</strong>S
</h1> -->

> This repo has been populated by an initial template to help get you started. Please
> make sure to update the content to build a great experience for community-building.
# **UFO** ![ufo](./assets/ufo_blue.png =x24): A **U**I-**F**ocused Agent for Windows **O**S Interaction

As the maintainer of this project, please make a few updates:
<div align="center">

- Improving this README.MD file to provide a great experience
- Updating SUPPORT.MD with content about this project's support experience
- Understanding the security reporting process in SECURITY.MD
- Remove this section from the README
![Python Version](https://img.shields.io/badge/Python-3776AB?&logo=python&logoColor=white-blue&label=3.10%20%7C%203.11)&ensp;
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)&ensp;
![Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)

## Contributing
</div>

This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
**UFO** is a **UI-Focused** dual-agent framework that seamlessly navigates and operates within individual applications and across them to fulfill user requests on **Windows OS**, even when spanning multiple applications.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
<h1 align="center">
<img src="./assets/overview.png"/>
</h1>


## 🆕 News
- 📅 2024-02-30 UFO is released on GitHub🎈.


## 💥 Highlights

- [x] **First Windows Agent Framework** - UFO represents the first agent framework that can translate user request in natural language into grounded operation on Windows OS.
- [x] **Interactive Mode** - UFO allows for multiple sub-requests from users in the same session for completing complex task.
- [x] **Action Safeguard** - UFO supports safeguard to prompt for user confirmation when the action is sensitive.
- [x] **Easy Extension** - UFO is easy to extend to accomplish more complex tasks with different operations.


## ✨ Getting Started


### 🛠️ Step 1: Installation
UFO requires **Python >= 3.10** running on **Windows OS >= 10**. It can be installed by running the following command:
```bash
# [optional to create conda environment]
# conda create -n ufo python=3.10
# conda activate ufo

# clone the repository
git clone https://github.com/microsoft/UFO.git
cd UFO
# install the requirements
pip install -r requirements.txt
```

### 🖊️ Step 2: Configure the LLMs
Before running UFO, you need to provide your LLM configurations. Taking OpenAI as an example, you can configure `ufo/config/config.yaml` file as follows.

#### OpenAI
```
OPENAI_API_BASE: Your OpenAI Endpoint # The base URL for the OpenAI API
OPENAI_API_KEY: Your OpenAI Key # Set the value to sk-xxx if you host the openai interface for open llm model
OPENAI_API_MODEL: GPT Model Name # The only OpenAI model by now that accepts visual input
```

### 🚩 Step 3: Start UFO

#### ⌨️ Command Line (CLI)

```bash
# assume you are in the cloned UFO folder
python -m ufo --task <your_task_name>
```

This will start the UFO process and you can interact with it through the command line interface.
If everything goes well, you will see the following message:

```bash
Welcome to use UFO🛸, A UI-focused Multimodal Agent for Windows OS.
_ _ _____ ___
| | | || ___| / _ \
| | | || |_ | | | |
| |_| || _| | |_| |
\___/ |_| \___/
Please enter your request to be completed🛸:
```
#### <**Reminder: Before inputing your request, please make sure the targeted applications are active on the system.**>


### Step 4 🎥: Execution Logs

You can find the screenshots taken and request and reponse logs in the following folder:
```
ufo/logs/<your_task_name>/
```
You may use them to debug, replay, or analyze the agent output.


## ❓Get help
* ❔GitHub Issues (prefered)
* For other communications, please contact [email protected]
---

## 🎬 Demo Examples

We present two demos videos that complete user request on Windows OS using UFO.

#### 1️⃣🗑️ Example 1: Deleting all notes on a PowerPoint presentation.
In this example, we will show you how to use UFO to deleting all notes on a PowerPoint presentation with just a few simple steps. Explore it to work smarter not harder!


#### 2️⃣📧 Example 2: Composing an email using text from multiple sources.
In this example, we will show you how to use UFO to extract texts from Word documents, description of an image, to compose an email and send. Enjoy your cross-application experiment with UFO!


## 📚 Citation
Our paper could be found [here](http://export.arxiv.org/abs/2311.17541).
If you use UFO in your research, please cite our paper:
```
@article{ufo,
title={UFO: A UI-Focused Agent for Windows OS Interaction},
author={Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang},
journal={arXiv preprint arXiv:2311.17541},
year={2024}
}
```

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [[email protected]](mailto:[email protected]) with any additional questions or comments.

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.
Any use of third-party trademarks or logos are subject to those third-party's policies.
Binary file added assets/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/ufo_blue.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/ufo_rv.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
art==6.1
colorama==0.4.6
msal==1.25.0
openai==1.11.1
Pillow==10.2.0
pywinauto==0.6.8
PyYAML==6.0.1
Requests==2.31.0
Empty file added ufo/__init__.py
Empty file.
5 changes: 5 additions & 0 deletions ufo/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from . import ufo

if __name__ == "__main__":
# Execute the main script
ufo.main()
Empty file added ufo/config/__init__.py
Empty file.
24 changes: 24 additions & 0 deletions ufo/config/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import os
import yaml


def load_config(config_path="ufo/config/config.yaml"):
"""
Load the configuration from a YAML file and environment variables.
:param config_path: The path to the YAML config file. Defaults to "./config.yaml".
:return: Merged configuration from environment variables and YAML file.
"""
# Copy environment variables to avoid modifying them directly
configs = dict(os.environ)

try:
with open(config_path, "r") as file:
yaml_data = yaml.safe_load(file)
# Update configs with YAML data
if yaml_data:
configs.update(yaml_data)
except FileNotFoundError:
print(f"Warning: Config file not found at {config_path}. Using only environment variables.")

return configs
40 changes: 40 additions & 0 deletions ufo/config/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
version: 0.1

OPENAI_API_BASE: "https://cloudgpt-swc.openai.azure.com/openai/deployments/gpt-4-visual-preview/chat/completions?api-version=2023-12-01-preview" # The base URL for the OpenAI API
OPENAI_API_KEY: "" # Set the value to sk-xxx if you host the openai interface for open llm model
OPENAI_API_MODEL: "gpt-4-visual-preview" # The only OpenAI model by now that accepts visual input
CONTROL_BACKEND: "uia" # The backend for control action
MAX_TOKENS: 2000 # The max token limit for the response completion
MAX_RETRY: 3 # The max retry limit for the response completion
MAX_STEP: 30 # The max step limit for completing the user request
SLEEP_TIME: 5 # The sleep time between each step to wait for the window to be ready
TEMPERATURE: 0.0 # The temperature of the model: the lower the value, the more consistent the output of the model
TOP_P: 0.0 # The top_p of the model: the lower the value, the more conservative the output of the model
SAFE_GUARD: True # Whether to use the safe guard to prevent the model from doing sensitve operations.
CONTROL_TYPE_LIST: ["Button", "Edit", "TabItem", "Document", "ListItem", "MenuItem", "ScrollBar", "TreeItem", "Hyperlink", "ComboBox", "RadioButton"] # The list of control types that are allowed to be selected
HISTORY_KEYS: ["Step", "Thought", "ControlText", "Action", "Comment", "Results"] # The keys of the action history for the next step.
ANNOTATION_COLORS: {
"Button": "#FFF68F",
"Edit": "#A5F0B5",
"TabItem": "#A5E7F0",
"Document": "#FFD18A",
"ListItem": "#D9C3FE",
"MenuItem": "#E7FEC3",
"ScrollBar": "#FEC3F8",
"TreeItem": "#D6D6D6",
"Hyperlink": "#91FFEB",
"ComboBox": "#D8B6D4"
}

PRINT_LOG: FALSE # Whether to print the log
CONCAT_SCREENSHOT: True # Whether to concat the screenshot for the control item
LOG_LEVEL: "DEBUG" # The log level
INCLUDE_LAST_SCREENSHOT: True # Whether to include the last screenshot in the observation
REQUEST_TIMEOUT: 250 # The call timeout for the GPT-V model
APP_SELECTION_PROMPT: "ufo/prompts/base/app_selection.yaml" # The prompt for the app selection
ACTION_SELECTION_PROMPT: "ufo/prompts/base/action_selection.yaml" # The prompt for the action selection
INPUT_TEXT_API: "type_keys" # The input text API




Empty file added ufo/llm/__init__.py
Empty file.
59 changes: 59 additions & 0 deletions ufo/llm/llm_call.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
import requests
import time
from ..config.config import load_config
from ..utils import print_with_color

configs = load_config()


def get_gptv_completion(messages, headers):
"""
Get GPT-V completion from messages.
messages: The messages to be sent to GPT-V.
headers: The headers of the request.
endpoint: The endpoint of the request.
max_tokens: The maximum number of tokens to generate.
temperature: The sampling temperature.
model: The model to use.
max_retry: The maximum number of retries.
return: The response of the request.
"""

payload = {
"messages": messages,
"temperature": configs["TEMPERATURE"],
"max_tokens": configs["MAX_TOKENS"],
"top_p": configs["TOP_P"],
"model": configs["OPENAI_API_MODEL"]
}


for _ in range(configs["MAX_RETRY"]):
try:
response = requests.post(configs["OPENAI_API_BASE"], headers=headers, json=payload)
response_json = response.json()
response.raise_for_status() # Will raise an HTTPError if the HTTP request returned an unsuccessful status code


if "choices" not in response_json:
print_with_color(f"GPT Error: No Reply", "red")
continue

if "error" not in response_json:
usage = response_json.get("usage", {})
prompt_tokens = usage.get("prompt_tokens", 0)
completion_tokens = usage.get("completion_tokens", 0)

cost = prompt_tokens / 1000 * 0.01 + completion_tokens / 1000 * 0.03

return response_json, cost
except requests.RequestException as e:
print_with_color(f"Error making API request: {e}", "red")
print_with_color(str(response_json), "red")
try:
print_with_color(response.json(), "red")
except:
_
time.sleep(3)
continue

Loading

0 comments on commit bb935bc

Please sign in to comment.