Skip to content

Commit

Permalink
Merge pull request microsoft#108 from microsoft/pre-release
Browse files Browse the repository at this point in the history
Pre release
  • Loading branch information
vyokky authored Jul 5, 2024
2 parents 278d973 + 71509ec commit 91ae4a8
Show file tree
Hide file tree
Showing 30 changed files with 565 additions and 364 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,49 +8,23 @@ For complex tasks, users can demonstrate the task using [Step Recorder](https://

## Mechanism

### Step 1: Record the Task
- **Tool**: [Step Recorder](https://support.microsoft.com/en-us/windows/record-steps-to-reproduce-a-problem-46582a9b-620f-2e36-00c9-04e25d784e47)
- **Output**: Zip file containing the task description and action trajectories

### Step 2: Save the Demonstration
- **Action**: Save the recorded demonstration as a zip file

### Step 3: Extract and Summarize the Demonstration
- **Tool**: `DemonstrationSummarizer`
- **Process**:
1. Extract the zip file
2. Summarize the demonstration
- **Configuration**: Save the summarized demonstration in the `DEMONSTRATION_SAVED_PATH` as specified in the `config_dev.yaml` file

### Step 4: Retrieve and Utilize the Demonstration
- **When**: AppAgent encounters a similar task
- **Action**: Retrieve the saved demonstration from the demonstration database
- **Tool**: `DemonstrationRetriever`
- **Outcome**: Generate a plan based on the retrieved demonstration

### Demonstration Workflow Diagram
```mermaid
graph TD;
A[User Records Task] --> B[Save as Zip File]
B --> C[Extract Zip File]
C --> D[Summarize with DemonstrationSummarizer]
D --> E[Save in DEMONSTRATION_SAVED_PATH]
F[AppAgent Encounters Similar Task] --> G[Retrieve Demonstration from Database]
G --> H[Generate Plan]
```
UFO use the [Step Recorder](https://support.microsoft.com/en-us/windows/record-steps-to-reproduce-a-problem-46582a9b-620f-2e36-00c9-04e25d784e47) tool to record the task and action trajectories. The recorded demonstration is saved as a zip file. The `DemonstrationSummarizer` class extracts and summarizes the demonstration. The summarized demonstration is saved in the `DEMONSTRATION_SAVED_PATH` as specified in the `config_dev.yaml` file. When the AppAgent encounters a similar task, the `DemonstrationRetriever` class retrieves the saved demonstration from the demonstration database and generates a plan based on the retrieved demonstration.

!!! info
You can find how to record the task and action trajectories using the Step Recorder tool in the [User Demonstration Provision](../../creating_app_agent/demonstration_provision.md) document.


You can find a demo video of learning from user demonstrations:

<iframe width="560" height="315" src="https://github.com/yunhao0204/UFO/assets/59384816/0146f83e-1b5e-4933-8985-fe3f24ec4777" frameborder="0" allowfullscreen></iframe>

<br>

<br>

## Activating Learning from User Demonstrations

### Step 1: User Demonstration
Please follow the steps in the [User Demonstration Provision](../../creating_app_agent/demonstration_provision.md) document to provide help documents to the AppAgent.
Please follow the steps in the [User Demonstration Provision](../../creating_app_agent/demonstration_provision.md) document to provide user demonstrations.

### Step 2: Configure the AppAgent
Configure the following parameters to allow UFO to use RAG from user demonstrations:
Expand Down
2 changes: 1 addition & 1 deletion documents/docs/automator/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The advantage of using the command design pattern in the agent framework is that

## Receiver

The `Receiver` is a central component in the Automator application that performs actions on the application. It provides functionalities to interact with the application and execute the action. All available actions are registered in the `Receiver` with the `ReceiverManager` class.
The `Receiver` is a central component in the Automator application that performs actions on the application. It provides functionalities to interact with the application and execute the action. All available actions are registered in the with the `ReceiverManager` class.

You can find the reference for a basic `Receiver` class below:

Expand Down
59 changes: 58 additions & 1 deletion documents/docs/automator/web_automator.md
Original file line number Diff line number Diff line change
@@ -1 +1,58 @@
# Web Automator
# Web Automator

We also support the use of the `Web Automator` to get the content of a web page. The `Web Automator` is implemented in `ufo/autoamtor/app_apis/web` module.

## Configuration

There are several configurations that need to be set up before using the API Automator in the `config_dev.yaml` file. Below is the list of configurations related to the API Automator:

| Configuration Option | Description | Type | Default Value |
|-------------------------|---------------------------------------------------------------------------------------------------------|----------|---------------|
| `USE_APIS` | Whether to allow the use of application APIs. | Boolean | True |
| `APP_API_PROMPT_ADDRESS` | The prompt address for the application API. | Dict | {"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml", "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml"} |

!!!note
Only `msedge.exe` and `chrome.exe` are currently supported by the Web Automator.

## Receiver
The Web Automator receiver is the `WebReceiver` class defined in the `ufo/automator/app_apis/web/webclient.py` module:

::: automator.app_apis.web.webclient.WebReceiver

<br>

## Command

We now only support one command in the Web Automator to get the content of a web page into a markdown format. More commands will be added in the future for the Web Automator.

```python
@WebReceiver.register
class WebCrawlerCommand(WebCommand):
"""
The command to run the crawler with various options.
"""

def execute(self):
"""
Execute the command to run the crawler.
:return: The result content.
"""
return self.receiver.web_crawler(
url=self.params.get("url"),
ignore_link=self.params.get("ignore_link", False),
)

@classmethod
def name(cls) -> str:
"""
The name of the command.
"""
return "web_crawler"
```


Below is the list of available commands in the Web Automator that are currently supported by UFO:

| Command Name | Function Name | Description |
|--------------|---------------|-------------|
| `WebCrawlerCommand` | `web_crawler` | Get the content of a web page into a markdown format. |
8 changes: 5 additions & 3 deletions documents/docs/automator/wincom_automator.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@ There are several configurations that need to be set up before using the API Aut
| Configuration Option | Description | Type | Default Value |
|-------------------------|---------------------------------------------------------------------------------------------------------|----------|---------------|
| `USE_APIS` | Whether to allow the use of application APIs. | Boolean | True |
| `API_PROMPT` | The prompt for the UI automation API. | String | "ufo/prompts/share/base/api.yaml" |
| `WORD_API_PROMPT` | The prompt for the Word APIs. | String | "ufo/prompts/apps/word/api.yaml" |
| `EXCEL_API_PROMPT` | The prompt for the Excel APIs. | String | "ufo/prompts/apps/excel/api.yaml" |
| `APP_API_PROMPT_ADDRESS` | The prompt address for the application API. | Dict | {"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml", "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml"} |

!!! note
Only `WINWORD.EXE` and `EXCEL.EXE` are currently supported by the API Automator.


## Receiver
The base class for the receiver of the API Automator is the `WinCOMReceiverBasic` class defined in the `ufo/automator/app_apis/basic` module. It is initialized with the application's win32 com object and provides functionalities to interact with the application's native API. Below is the reference for the `WinCOMReceiverBasic` class:
Expand Down
3 changes: 1 addition & 2 deletions documents/docs/configurations/developer_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,7 @@ These prompt configuration parameters are used for the application and control A
| Configuration Option | Description | Type | Default Value |
|------------------------|-------------------------------------|--------|--------------------------------------------|
| `API_PROMPT` | The prompt for the UI automation API. | String | "ufo/prompts/share/base/api.yaml" |
| `WORD_API_PROMPT` | The prompt for the Word APIs. | String | "ufo/prompts/apps/word/api.yaml" |
| `EXCEL_API_PROMPT` | The prompt for the Excel APIs. | String | "ufo/prompts/apps/excel/api.yaml" |
| `APP_API_PROMPT_ADDRESS` | The prompt address for the application API. | Dict | {"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml", "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml"} |

## pywinauto Configuration

Expand Down
Loading

0 comments on commit 91ae4a8

Please sign in to comment.