forked from microsoft/UFO
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request microsoft#106 from microsoft/pre-release
Documentation updated
- Loading branch information
Showing
81 changed files
with
2,443 additions
and
134 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
16 changes: 16 additions & 0 deletions
16
documents/docs/advanced_usage/control_filtering/icon_filtering.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Icon Filter | ||
|
||
The icon control filter is a method to filter the controls based on the similarity between the control icon image and the agent's plan using the image/text embeddings. | ||
|
||
## Configuration | ||
|
||
To activate the icon control filtering, you need to add `ICON` to the `CONTROL_FILTER` list in the `config_dev.yaml` file. Below is the detailed icon control filter configuration in the `config_dev.yaml` file: | ||
|
||
- `CONTROL_FILTER`: A list of filtering methods that you want to apply to the controls. To activate the icon control filtering, add `ICON` to the list. | ||
- `CONTROL_FILTER_TOP_K_ICON`: The number of controls to keep after filtering. | ||
- `CONTROL_FILTER_MODEL_ICON_NAME`: The control filter model name for icon similarity. By default, it is set to "clip-ViT-B-32". | ||
|
||
|
||
# Reference | ||
|
||
:::automator.ui_control.control_filter.IconControlFilter |
22 changes: 22 additions & 0 deletions
22
documents/docs/advanced_usage/control_filtering/overview.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Control Filtering | ||
|
||
There may be many controls items in the application, which may not be relevant to the task. UFO can filter out the irrelevant controls and only focus on the relevant ones. This filtering process can reduce the complexity of the task. | ||
|
||
Execept for configuring the control types for selection on `CONTROL_LIST` in `config_dev.yaml`, UFO also supports filtering the controls based on semantic similarity or keyword matching between the agent's plan and the control's information. We currerntly support the following filtering methods: | ||
|
||
| Filtering Method | Description | | ||
|------------------|-------------| | ||
| [`Text`](./text_filtering.md) | Filter the controls based on the control text. | | ||
| [`Semantic`](./semantic_filtering.md) | Filter the controls based on the semantic similarity. | | ||
| [`Icon`](./icon_filtering.md) | Filter the controls based on the control icon image. | | ||
|
||
|
||
## Configuration | ||
You can activate the control filtering by setting the `CONTROL_FILTER` in the `config_dev.yaml` file. The `CONTROL_FILTER` is a list of filtering methods that you want to apply to the controls, which can be `TEXT`, `SEMANTIC`, or `ICON`. | ||
|
||
You can configure multiple filtering methods in the `CONTROL_FILTER` list. | ||
|
||
# Reference | ||
The implementation of the control filtering is base on the `BasicControlFilter` class located in the `ufo/automator/ui_control/control_filter.py` file. Concrete filtering class inherit from the `BasicControlFilter` class and implement the `control_filter` method to filter the controls based on the specific filtering method. | ||
|
||
:::automator.ui_control.control_filter.BasicControlFilter |
15 changes: 15 additions & 0 deletions
15
documents/docs/advanced_usage/control_filtering/semantic_filtering.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Sematic Control Filter | ||
|
||
The semantic control filter is a method to filter the controls based on the semantic similarity between the agent's plan and the control's text using their embeddings. | ||
|
||
## Configuration | ||
|
||
To activate the semantic control filtering, you need to add `SEMANTIC` to the `CONTROL_FILTER` list in the `config_dev.yaml` file. Below is the detailed sematic control filter configuration in the `config_dev.yaml` file: | ||
|
||
- `CONTROL_FILTER`: A list of filtering methods that you want to apply to the controls. To activate the semantic control filtering, add `SEMANTIC` to the list. | ||
- `CONTROL_FILTER_TOP_K_SEMANTIC`: The number of controls to keep after filtering. | ||
- `CONTROL_FILTER_MODEL_SEMANTIC_NAME`: The control filter model name for semantic similarity. By default, it is set to "all-MiniLM-L6-v2". | ||
|
||
# Reference | ||
|
||
:::automator.ui_control.control_filter.SemanticControlFilter |
16 changes: 16 additions & 0 deletions
16
documents/docs/advanced_usage/control_filtering/text_filtering.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Text Control Filter | ||
|
||
The text control filter is a method to filter the controls based on the control text. The agent's plan on the current step usually contains some keywords or phrases. This method filters the controls based on the matching between the control text and the keywords or phrases in the agent's plan. | ||
|
||
## Configuration | ||
|
||
To activate the text control filtering, you need to add `TEXT` to the `CONTROL_FILTER` list in the `config_dev.yaml` file. Below is the detailed text control filter configuration in the `config_dev.yaml` file: | ||
|
||
- `CONTROL_FILTER`: A list of filtering methods that you want to apply to the controls. To activate the text control filtering, add `TEXT` to the list. | ||
- `CONTROL_FILTER_TOP_K_PLAN`: The number of agent's plan keywords or phrases to use for filtering the controls. | ||
|
||
|
||
|
||
# Reference | ||
|
||
:::automator.ui_control.control_filter.TextControlFilter |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Customization | ||
|
||
Sometimes, UFO may need additional context or information to complete a task. These information are important and customized for each user. UFO can ask the user for additional information and save it in the local memory for future reference. This customization feature allows UFO to provide a more personalized experience to the user. | ||
|
||
## Scenario | ||
|
||
Let's consider a scenario where UFO needs additional information to complete a task. UFO is tasked with booking a cab for the user. To book a cab, UFO needs to know the exact address of the user. UFO will ask the user for the address and save it in the local memory for future reference. Next time, when UFO is asked to complete a task that requires the user's address, UFO will use the saved address to complete the task, without asking the user again. | ||
|
||
|
||
## Implementation | ||
We currently implement the customization feature in the `HostAgent` class. When the `HostAgent` needs additional information, it will transit to the `PENDING` state and ask the user for the information. The user will provide the information, and the `HostAgent` will save it in the local memory base for future reference. The saved information is stored in the `blackboard` and can be accessed by all agents in the session. | ||
|
||
!!! note | ||
The customization memory base is only saved in a **local file**. These information will **not** upload to the cloud or any other storage to protect the user's privacy. | ||
|
||
## Configuration | ||
|
||
You can configure the customization feature by setting the following field in the `config_dev.yaml` file. | ||
|
||
| Configuration Option | Description | Type | Default Value | | ||
|------------------------|----------------------------------------------|---------|---------------------------------------| | ||
| `USE_CUSTOMIZATION` | Whether to enable the customization. | Boolean | True | | ||
| `QA_PAIR_FILE` | The path for the historical QA pairs. | String | "customization/historical_qa.txt" | | ||
| `QA_PAIR_NUM` | The number of QA pairs for the customization.| Integer | 20 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
# Follower Mode | ||
|
||
The Follower mode is a feature of UFO that the agent follows a list of pre-defined steps in natural language to take actions on applications. Different from the normal mode, this mode creates an `FollowerAgent` that follows the plan list provided by the user to interact with the application, instead of generating the plan itself. This mode is useful for debugging and software testing or verification. | ||
|
||
## Quick Start | ||
|
||
### Step 1: Create a Plan file | ||
|
||
Before starting the Follower mode, you need to create a plan file that contains the list of steps for the agent to follow. The plan file is a JSON file that contains the following fields: | ||
|
||
| Field | Description | Type | | ||
| --- | --- | --- | | ||
| task | The task description. | String | | ||
| steps | The list of steps for the agent to follow. | List of Strings | | ||
| object | The application or file to interact with. | String | | ||
|
||
Below is an example of a plan file: | ||
|
||
```json | ||
{ | ||
"task": "Type in a text of 'Test For Fun' with heading 1 level", | ||
"steps": | ||
[ | ||
"1.type in 'Test For Fun'", | ||
"2.Select the 'Test For Fun' text", | ||
"3.Click 'Home' tab to show the 'Styles' ribbon tab", | ||
"4.Click 'Styles' ribbon tab to show the style 'Heading 1'", | ||
"5.Click 'Heading 1' style to apply the style to the selected text" | ||
], | ||
"object": "draft.docx" | ||
} | ||
``` | ||
|
||
!!! note | ||
The `object` field is the application or file that the agent will interact with. The object **must be active** (can be minimized) when starting the Follower mode. | ||
|
||
|
||
### Step 2: Start the Follower Mode | ||
To start the Follower mode, run the following command: | ||
|
||
```bash | ||
# assume you are in the cloned UFO folder | ||
python ufo.py --task_name {task_name} --mode follower --plan {plan_file} | ||
``` | ||
|
||
!!! tip | ||
Replace `{task_name}` with the name of the task and `{plan_file}` with the path to the plan file. | ||
|
||
|
||
### Step 3: Run in Batch (Optional) | ||
|
||
You can also run the Follower mode in batch mode by providing a folder containing multiple plan files. The agent will follow the plans in the folder one by one. To run in batch mode, run the following command: | ||
|
||
```bash | ||
# assume you are in the cloned UFO folder | ||
python ufo.py --task_name {task_name} --mode follower --plan {plan_folder} | ||
``` | ||
|
||
UFO will automatically detect the plan files in the folder and run them one by one. | ||
|
||
!!! tip | ||
Replace `{task_name}` with the name of the task and `{plan_folder}` with the path to the folder containing plan files. | ||
|
||
|
||
## Evaluation | ||
You may want to evaluate the `task` is completed successfully or not by following the plan. UFO will call the `EvaluationAgent` to evaluate the task if `EVA_SESSION` is set to `True` in the `config_dev.yaml` file. | ||
|
||
You can check the evaluation log in the `logs/{task_name}/evaluation.log` file. | ||
|
||
# References | ||
The follower mode employs a `PlanReader` to parse the plan file and create a `FollowerSession` to follow the plan. | ||
|
||
## PlanReader | ||
The `PlanReader` is located in the `ufo/module/sessions/plan_reader.py` file. | ||
|
||
:::module.sessions.plan_reader.PlanReader | ||
|
||
<br> | ||
## FollowerSession | ||
|
||
The `FollowerSession` is also located in the `ufo/module/sessions/session.py` file. | ||
|
||
:::module.sessions.session.FollowerSession |
65 changes: 65 additions & 0 deletions
65
documents/docs/advanced_usage/reinforce_appagent/experience_learning.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Learning from Self-Experience | ||
|
||
When UFO successfully completes a task, user can choose to save the successful experience to reinforce the AppAgent. The AppAgent can learn from its own successful experiences to improve its performance in the future. | ||
|
||
## Mechanism | ||
|
||
### Step 1: Complete a Session | ||
- **Event**: UFO completes a session | ||
|
||
### Step 2: Ask User to Save Experience | ||
- **Action**: The agent prompts the user with a choice to save the successful experience | ||
|
||
<h1 align="center"> | ||
<img src="../../../img/save_ask.png" alt="Save Experience" width="100%"> | ||
</h1> | ||
|
||
### Step 3: User Chooses to Save | ||
- **Action**: If the user chooses to save the experience | ||
|
||
### Step 4: Summarize and Save the Experience | ||
- **Tool**: `ExperienceSummarizer` | ||
- **Process**: | ||
1. Summarize the experience into a demonstration example | ||
2. Save the demonstration example in the `EXPERIENCE_SAVED_PATH` as specified in the `config_dev.yaml` file | ||
3. The demonstration example includes similar [fields](../../prompts/examples_prompts.md) as those used in the AppAgent's prompt | ||
|
||
### Step 5: Retrieve and Utilize Saved Experience | ||
- **When**: The AppAgent encounters a similar task in the future | ||
- **Action**: Retrieve the saved experience from the experience database | ||
- **Outcome**: Use the retrieved experience to generate a plan | ||
|
||
### Workflow Diagram | ||
```mermaid | ||
graph TD; | ||
A[Complete Session] --> B[Ask User to Save Experience] | ||
B --> C[User Chooses to Save] | ||
C --> D[Summarize with ExperienceSummarizer] | ||
D --> E[Save in EXPERIENCE_SAVED_PATH] | ||
F[AppAgent Encounters Similar Task] --> G[Retrieve Saved Experience] | ||
G --> H[Generate Plan] | ||
``` | ||
|
||
## Activate the Learning from Self-Experience | ||
|
||
### Step 1: Configure the AppAgent | ||
Configure the following parameters to allow UFO to use the RAG from its self-experience: | ||
|
||
| Configuration Option | Description | Type | Default Value | | ||
|----------------------|-------------|------|---------------| | ||
| `RAG_EXPERIENCE` | Whether to use the RAG from its self-experience | Boolean | False | | ||
| `RAG_EXPERIENCE_RETRIEVED_TOPK` | The topk for the offline retrieved documents | Integer | 5 | | ||
|
||
# Reference | ||
|
||
## Experience Summarizer | ||
The `ExperienceSummarizer` class is located in the `ufo/experience/experience_summarizer.py` file. The `ExperienceSummarizer` class provides the following methods to summarize the experience: | ||
|
||
:::experience.summarizer.ExperienceSummarizer | ||
|
||
<br> | ||
|
||
## Experience Retriever | ||
The `ExperienceRetriever` class is located in the `ufo/rag/retriever.py` file. The `ExperienceRetriever` class provides the following methods to retrieve the experience: | ||
|
||
:::rag.retriever.ExperienceRetriever |
29 changes: 29 additions & 0 deletions
29
documents/docs/advanced_usage/reinforce_appagent/learning_from_bing_search.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Learning from Bing Search | ||
|
||
UFO provides the capability to reinforce the AppAgent by searching for information on Bing to obtain up-to-date knowledge for niche tasks or applications which beyond the `AppAgent`'s knowledge. | ||
|
||
## Mechanism | ||
Upon receiving a request, the `AppAgent` constructs a Bing search query based on the request and retrieves the search results from Bing. The `AppAgent` then extracts the relevant information from the top-k search results from Bing and generates a plan based on the retrieved information. | ||
|
||
|
||
## Activate the Learning from Bing Search | ||
|
||
|
||
### Step 1: Obtain Bing API Key | ||
To use the Bing search, you need to obtain a Bing API key. You can follow the instructions on the [Microsoft Azure Bing Search API](https://www.microsoft.com/en-us/bing/apis/bing-web-search-api) to get the API key. | ||
|
||
|
||
### Step 2: Configure the AppAgent | ||
|
||
Configure the following parameters to allow UFO to use online Bing search for the decision-making process: | ||
|
||
| Configuration Option | Description | Type | Default Value | | ||
|----------------------|-------------|------|---------------| | ||
| `RAG_ONLINE_SEARCH` | Whether to use the Bing search | Boolean | False | | ||
| `BING_API_KEY` | The Bing search API key | String | "" | | ||
| `RAG_ONLINE_SEARCH_TOPK` | The topk for the online search | Integer | 5 | | ||
| `RAG_ONLINE_RETRIEVED_TOPK` | The topk for the online retrieved searched results | Integer | 1 | | ||
|
||
# Reference | ||
|
||
:::rag.retriever.OnlineDocRetriever |
Oops, something went wrong.