Merge pull request microsoft#106 from microsoft/pre-release

Documentation updated
Hitomi-Hoshi · Jul 4, 2024 · 278d973 · 278d973
2 parents d8214c3 + 2e90102
commit 278d973
Show file tree

Hide file tree

Showing 81 changed files with 2,443 additions and 134 deletions.
diff --git a/README.md b/README.md
@@ -28,7 +28,7 @@
 - <b>AppAgent 👾</b>, responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. 
 - <b>Control Interaction 🎮</b>, is tasked with translating actions from HostAgent and AppAgent into interactions with the application and its UI controls. It's essential that the targeted controls are compatible with the Windows **UI Automation** or **Win32** API.
 
-Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend the application UI and fulfill the user's request. For more details, please consult our [technical report](https://arxiv.org/abs/2402.07939).
+Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend the application UI and fulfill the user's request. For more details, please consult our [technical report](https://arxiv.org/abs/2402.07939) and [Documentation](https://microsoft.github.io/UFO/).
 <h1 align="center">
     <img src="./assets/framework_v2.png"/> 
 </h1>
@@ -137,9 +137,17 @@ Optionally, you can set a backup language model (LLM) engine in the `BACKUP_AGEN
 UFO also supports other LLMs and advanced configurations, such as customize your own model, please check the [documents](https://microsoft.github.io/UFO/supported_models/overview/) for more details. Because of the limitations of model input, a lite version of the prompt is provided to allow users to experience it, which is configured in `config_dev.yaml`.
 
 ### 📔 Step 3: Additional Setting for RAG (optional).
-If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the `ufo/config/config.yaml` file.
+If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the `ufo/config/config.yaml` file. 
 
-#### RAG from Offline Help Document
+We provide the following options for RAG to enhance UFO's capabilities:
+- **[Offline Help Document](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_help_document/)**: Enable UFO to retrieve information from offline help documents.
+- **[Online Bing Search Engine](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_bing_search/)**: Enhance UFO's capabilities by utilizing the most up-to-date online search results.
+- **[Self-Experience](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/experience_learning/)**: Save task completion trajectories into UFO's memory for future reference.
+- **[User-Demonstration](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_demonstration/)**: Boost UFO's capabilities through user demonstration.
+
+Consult their respective documentation for more information on how to configure these settings.
+
+<!-- #### RAG from Offline Help Document
 Before enabling this function, you need to create an offline indexer for your help document. Please refer to the [README](./learner/README.md) to learn how to create an offline vectored database for retrieval. You can enable this function by setting the following configuration:
 ```bash
 ## RAG Configuration for the offline docs
@@ -184,7 +192,7 @@ You can enable this function by setting the following configuration:
 ## RAG Configuration for demonstration
 RAG_DEMONSTRATION: True  # Whether to use the RAG from its user demonstration.
 RAG_DEMONSTRATION_RETRIEVED_TOPK: 5  # The topk for the demonstration examples.
-```
+``` -->
 
 
 ### 🎉 Step 4: Start UFO

diff --git a/documents/docs/advanced_usage/control_filtering/icon_filtering.md b/documents/docs/advanced_usage/control_filtering/icon_filtering.md
@@ -0,0 +1,16 @@
+# Icon Filter
+
+The icon control filter is a method to filter the controls based on the similarity between the control icon image and the agent's plan using the image/text embeddings.
+
+## Configuration
+
+To activate the icon control filtering, you need to add `ICON` to the `CONTROL_FILTER` list in the `config_dev.yaml` file. Below is the detailed icon control filter configuration in the `config_dev.yaml` file:
+
+- `CONTROL_FILTER`: A list of filtering methods that you want to apply to the controls. To activate the icon control filtering, add `ICON` to the list.
+- `CONTROL_FILTER_TOP_K_ICON`: The number of controls to keep after filtering.
+- `CONTROL_FILTER_MODEL_ICON_NAME`: The control filter model name for icon similarity. By default, it is set to "clip-ViT-B-32".
+
+
+# Reference
+
+:::automator.ui_control.control_filter.IconControlFilter
diff --git a/documents/docs/advanced_usage/control_filtering/overview.md b/documents/docs/advanced_usage/control_filtering/overview.md
@@ -0,0 +1,22 @@
+# Control Filtering
+
+There may be many controls items in the application, which may not be relevant to the task. UFO can filter out the irrelevant controls and only focus on the relevant ones. This filtering process can reduce the complexity of the task.
+
+Execept for configuring the control types for selection on `CONTROL_LIST` in `config_dev.yaml`, UFO also supports filtering the controls based on semantic similarity or keyword matching between the agent's plan and the control's information. We currerntly support the following filtering methods:
+
+| Filtering Method | Description |
+|------------------|-------------|
+| [`Text`](./text_filtering.md)     | Filter the controls based on the control text. |
+| [`Semantic`](./semantic_filtering.md) | Filter the controls based on the semantic similarity. |
+| [`Icon`](./icon_filtering.md)    | Filter the controls based on the control icon image. |
+
+
+## Configuration
+You can activate the control filtering by setting the `CONTROL_FILTER` in the `config_dev.yaml` file. The `CONTROL_FILTER` is a list of filtering methods that you want to apply to the controls, which can be `TEXT`, `SEMANTIC`, or `ICON`. 
+
+You can configure multiple filtering methods in the `CONTROL_FILTER` list. 
+
+# Reference
+The implementation of the control filtering is base on the `BasicControlFilter` class located in the `ufo/automator/ui_control/control_filter.py` file. Concrete filtering class inherit from the `BasicControlFilter` class and implement the `control_filter` method to filter the controls based on the specific filtering method.
+
+:::automator.ui_control.control_filter.BasicControlFilter
diff --git a/documents/docs/advanced_usage/control_filtering/semantic_filtering.md b/documents/docs/advanced_usage/control_filtering/semantic_filtering.md
@@ -0,0 +1,15 @@
+# Sematic Control Filter
+
+The semantic control filter is a method to filter the controls based on the semantic similarity between the agent's plan and the control's text using their embeddings. 
+
+## Configuration
+
+To activate the semantic control filtering, you need to add `SEMANTIC` to the `CONTROL_FILTER` list in the `config_dev.yaml` file. Below is the detailed sematic control filter configuration in the `config_dev.yaml` file:
+
+- `CONTROL_FILTER`: A list of filtering methods that you want to apply to the controls. To activate the semantic control filtering, add `SEMANTIC` to the list.
+- `CONTROL_FILTER_TOP_K_SEMANTIC`: The number of controls to keep after filtering.
+- `CONTROL_FILTER_MODEL_SEMANTIC_NAME`: The control filter model name for semantic similarity. By default, it is set to "all-MiniLM-L6-v2".
+
+# Reference
+
+:::automator.ui_control.control_filter.SemanticControlFilter
diff --git a/documents/docs/advanced_usage/control_filtering/text_filtering.md b/documents/docs/advanced_usage/control_filtering/text_filtering.md
@@ -0,0 +1,16 @@
+# Text Control Filter
+
+The text control filter is a method to filter the controls based on the control text. The agent's plan on the current step usually contains some keywords or phrases. This method filters the controls based on the matching between the control text and the keywords or phrases in the agent's plan.
+
+## Configuration
+
+To activate the text control filtering, you need to add `TEXT` to the `CONTROL_FILTER` list in the `config_dev.yaml` file. Below is the detailed text control filter configuration in the `config_dev.yaml` file:
+
+- `CONTROL_FILTER`: A list of filtering methods that you want to apply to the controls. To activate the text control filtering, add `TEXT` to the list.
+- `CONTROL_FILTER_TOP_K_PLAN`: The number of agent's plan keywords or phrases to use for filtering the controls.
+
+
+
+# Reference
+
+:::automator.ui_control.control_filter.TextControlFilter
diff --git a/documents/docs/advanced_usage/customization.md b/documents/docs/advanced_usage/customization.md
@@ -0,0 +1,24 @@
+# Customization
+
+Sometimes, UFO may need additional context or information to complete a task. These information are important and customized for each user. UFO can ask the user for additional information and save it in the local memory for future reference. This customization feature allows UFO to provide a more personalized experience to the user.
+
+## Scenario
+
+Let's consider a scenario where UFO needs additional information to complete a task. UFO is tasked with booking a cab for the user. To book a cab, UFO needs to know the exact address of the user. UFO will ask the user for the address and save it in the local memory for future reference. Next time, when UFO is asked to complete a task that requires the user's address, UFO will use the saved address to complete the task, without asking the user again.
+
+
+## Implementation
+We currently implement the customization feature in the `HostAgent` class. When the `HostAgent` needs additional information, it will transit to the `PENDING` state and ask the user for the information. The user will provide the information, and the `HostAgent` will save it in the local memory base for future reference. The saved information is stored in the `blackboard` and can be accessed by all agents in the session.
+
+!!! note
+    The customization memory base is only saved in a **local file**. These information will **not** upload to the cloud or any other storage to protect the user's privacy.
+
+## Configuration
+
+You can configure the customization feature by setting the following field in the `config_dev.yaml` file.
+
+| Configuration Option   | Description                                  | Type    | Default Value                         |
+|------------------------|----------------------------------------------|---------|---------------------------------------|
+| `USE_CUSTOMIZATION`    | Whether to enable the customization.         | Boolean | True                                  |
+| `QA_PAIR_FILE`         | The path for the historical QA pairs.        | String  | "customization/historical_qa.txt"     |
+| `QA_PAIR_NUM`          | The number of QA pairs for the customization.| Integer | 20                                    |
diff --git a/documents/docs/advanced_usage/follower_mode.md b/documents/docs/advanced_usage/follower_mode.md
@@ -0,0 +1,83 @@
+# Follower Mode
+
+The Follower mode is a feature of UFO that the agent follows a list of pre-defined steps in natural language to take actions on applications. Different from the normal mode, this mode creates an `FollowerAgent` that follows the plan list provided by the user to interact with the application, instead of generating the plan itself. This mode is useful for debugging and software testing or verification.
+
+## Quick Start
+
+### Step 1: Create a Plan file
+
+Before starting the Follower mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields:
+
+| Field | Description | Type |
+| --- | --- | --- |
+| task | The task description. | String |
+| steps | The list of steps for the agent to follow. | List of Strings |
+| object | The application or file to interact with. | String |
+
+Below is an example of a plan file:
+
+```json
+{
+    "task": "Type in a text of 'Test For Fun' with heading 1 level",
+    "steps": 
+    [
+        "1.type in 'Test For Fun'", 
+        "2.Select the 'Test For Fun' text",
+        "3.Click 'Home' tab to show the 'Styles' ribbon tab",
+        "4.Click 'Styles' ribbon tab to show the style 'Heading 1'",
+        "5.Click 'Heading 1' style to apply the style to the selected text"
+    ],
+    "object": "draft.docx"
+}
+```
+
+!!! note
+    The `object` field is the application or file that the agent will interact with. The object **must be active** (can be minimized) when starting the Follower mode.
+
+
+### Step 2: Start the Follower Mode
+To start the Follower mode, run the following command:
+
+```bash
+# assume you are in the cloned UFO folder
+python ufo.py --task_name {task_name} --mode follower --plan {plan_file}
+```
+
+!!! tip
+    Replace `{task_name}` with the name of the task and `{plan_file}` with the path to the plan file.
+
+
+### Step 3: Run in Batch (Optional)
+
+You can also run the Follower mode in batch mode by providing a folder containing multiple plan files. The agent will follow the plans in the folder one by one. To run in batch mode, run the following command:
+
+```bash
+# assume you are in the cloned UFO folder
+python ufo.py --task_name {task_name} --mode follower --plan {plan_folder}
+``` 
+
+UFO will automatically detect the plan files in the folder and run them one by one.
+
+!!! tip
+    Replace `{task_name}` with the name of the task and `{plan_folder}` with the path to the folder containing plan files.
+
+
+## Evaluation
+You may want to evaluate the `task` is completed successfully or not by following the plan. UFO will call the `EvaluationAgent` to evaluate the task if `EVA_SESSION` is set to `True` in the `config_dev.yaml` file.
+
+You can check the evaluation log in the `logs/{task_name}/evaluation.log` file. 
+
+# References
+The follower mode employs a `PlanReader` to parse the plan file and create a `FollowerSession` to follow the plan. 
+
+## PlanReader
+The `PlanReader` is located in the `ufo/module/sessions/plan_reader.py` file.
+
+:::module.sessions.plan_reader.PlanReader
+
+<br>
+## FollowerSession
+
+The `FollowerSession` is also located in the `ufo/module/sessions/session.py` file.
+
+:::module.sessions.session.FollowerSession
diff --git a/documents/docs/advanced_usage/reinforce_appagent/experience_learning.md b/documents/docs/advanced_usage/reinforce_appagent/experience_learning.md
@@ -0,0 +1,65 @@
+# Learning from Self-Experience
+
+When UFO successfully completes a task, user can choose to save the successful experience to reinforce the AppAgent. The AppAgent can learn from its own successful experiences to improve its performance in the future.
+
+## Mechanism
+
+### Step 1: Complete a Session
+- **Event**: UFO completes a session
+
+### Step 2: Ask User to Save Experience
+- **Action**: The agent prompts the user with a choice to save the successful experience
+
+<h1 align="center">
+    <img src="../../../img/save_ask.png" alt="Save Experience" width="100%">
+</h1>
+
+### Step 3: User Chooses to Save
+- **Action**: If the user chooses to save the experience
+
+### Step 4: Summarize and Save the Experience
+- **Tool**: `ExperienceSummarizer`
+- **Process**:
+  1. Summarize the experience into a demonstration example
+  2. Save the demonstration example in the `EXPERIENCE_SAVED_PATH` as specified in the `config_dev.yaml` file
+  3. The demonstration example includes similar [fields](../../prompts/examples_prompts.md) as those used in the AppAgent's prompt
+
+### Step 5: Retrieve and Utilize Saved Experience
+- **When**: The AppAgent encounters a similar task in the future
+- **Action**: Retrieve the saved experience from the experience database
+- **Outcome**: Use the retrieved experience to generate a plan
+
+### Workflow Diagram
+```mermaid
+graph TD;
+    A[Complete Session] --> B[Ask User to Save Experience]
+    B --> C[User Chooses to Save]
+    C --> D[Summarize with ExperienceSummarizer]
+    D --> E[Save in EXPERIENCE_SAVED_PATH]
+    F[AppAgent Encounters Similar Task] --> G[Retrieve Saved Experience]
+    G --> H[Generate Plan]
+```
+
+## Activate the Learning from Self-Experience
+
+### Step 1: Configure the AppAgent
+Configure the following parameters to allow UFO to use the RAG from its self-experience:
+
+| Configuration Option | Description | Type | Default Value |
+|----------------------|-------------|------|---------------|
+| `RAG_EXPERIENCE` | Whether to use the RAG from its self-experience | Boolean | False |
+| `RAG_EXPERIENCE_RETRIEVED_TOPK` | The topk for the offline retrieved documents | Integer | 5 |
+
+# Reference
+
+## Experience Summarizer
+The `ExperienceSummarizer` class is located in the `ufo/experience/experience_summarizer.py` file. The `ExperienceSummarizer` class provides the following methods to summarize the experience:
+
+:::experience.summarizer.ExperienceSummarizer
+
+<br>
+
+## Experience Retriever
+The `ExperienceRetriever` class is located in the `ufo/rag/retriever.py` file. The `ExperienceRetriever` class provides the following methods to retrieve the experience:
+
+:::rag.retriever.ExperienceRetriever
diff --git a/documents/docs/advanced_usage/reinforce_appagent/learning_from_bing_search.md b/documents/docs/advanced_usage/reinforce_appagent/learning_from_bing_search.md
@@ -0,0 +1,29 @@
+# Learning from Bing Search
+
+UFO provides the capability to reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge for niche tasks or applications which beyond the `AppAgent`'s knowledge.
+
+## Mechanism
+Upon receiving a request, the `AppAgent` constructs a Bing search query based on the request and retrieves the search results from Bing. The `AppAgent` then extracts the relevant information from the top-k search results from Bing and generates a plan based on the retrieved information.
+
+
+## Activate the Learning from Bing Search
+
+
+### Step 1: Obtain Bing API Key
+To use the Bing search, you need to obtain a Bing API key. You can follow the instructions on the [Microsoft Azure Bing Search API](https://www.microsoft.com/en-us/bing/apis/bing-web-search-api) to get the API key.
+
+
+### Step 2: Configure the AppAgent
+
+Configure the following parameters to allow UFO to use online Bing search for the decision-making process:
+
+| Configuration Option | Description | Type | Default Value |
+|----------------------|-------------|------|---------------|
+| `RAG_ONLINE_SEARCH` | Whether to use the Bing search | Boolean | False |
+| `BING_API_KEY` | The Bing search API key | String | "" |
+| `RAG_ONLINE_SEARCH_TOPK` | The topk for the online search | Integer | 5 |
+| `RAG_ONLINE_RETRIEVED_TOPK` | The topk for the online retrieved searched results | Integer | 1 |
+
+# Reference
+
+:::rag.retriever.OnlineDocRetriever