Skip to content

Commit

Permalink
Merge pull request microsoft#111 from microsoft/vyokky/dev
Browse files Browse the repository at this point in the history
Vyokky/dev
  • Loading branch information
vyokky authored Jul 7, 2024
2 parents b46cb89 + a73a206 commit dd762d2
Show file tree
Hide file tree
Showing 6 changed files with 40 additions and 14 deletions.
3 changes: 3 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ When you submit a pull request, a CLA-bot will automatically determine whether y
to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the
instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

## note
You should sunmit your pull request to the `pre-release` branch, not the `main` branch.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
or contact [[email protected]](mailto:[email protected]) with any additional questions or comments.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend th
3. **Extended Application Interaction:** UFO now goes beyond UI controls, allowing interaction with your application through keyboard inputs and native APIs! Presently, we support Word ([examples](/ufo/prompts/apps/word/api.yaml)), with more to come soon. Customize and build your own interactions.
4. **Control Filtering:** Streamline LLM's action process by using control filters to remove irrelevant control items. Enable them in [config_dev.yaml](/ufo/config/config_dev.yaml) under the `control filtering` section at the bottom.
- 📅 2024-03-25: **New Release for v0.0.1!** Check out our exciting new features.
1. We now support creating your help documents for each Windows application to become an app expert. Check the [README](https://microsoft.github.io/UFO/creating_app_agent/help_document_provision/) for more details!
1. We now support creating your help documents for each Windows application to become an app expert. Check the [documentation](https://microsoft.github.io/UFO/creating_app_agent/help_document_provision/) for more details!
2. UFO now supports RAG from offline documents and online Bing search.
3. You can save the task completion trajectory into its memory for UFO's reference, improving its future success rate!
4. You can customize different GPT models for AppAgent and ActAgent. Text-only models (e.g., GPT-4) are now supported!
Expand Down Expand Up @@ -141,7 +141,7 @@ UFO also supports other LLMs and advanced configurations, such as customize your
If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the `ufo/config/config.yaml` file.

We provide the following options for RAG to enhance UFO's capabilities:
- [Offline Help Document](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_help_document/)* Enable UFO to retrieve information from offline help documents.
- [Offline Help Document](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_help_document/) Enable UFO to retrieve information from offline help documents.
- [Online Bing Search Engine](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_bing_search/): Enhance UFO's capabilities by utilizing the most up-to-date online search results.
- [Self-Experience](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/experience_learning/): Save task completion trajectories into UFO's memory for future reference.
- [User-Demonstration](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_demonstration/): Boost UFO's capabilities through user demonstration.
Expand Down
3 changes: 3 additions & 0 deletions documents/docs/about/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ When you submit a pull request, a CLA-bot will automatically determine whether y
to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the
instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

!!! note
You should sunmit your pull request to the `pre-release` branch, not the `main` branch.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
or contact [[email protected]](mailto:[email protected]) with any additional questions or comments.
2 changes: 1 addition & 1 deletion documents/docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@

- <b>AppAgent 👾</b>, responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application.

- <b>Control Interaction 🎮</b>, is tasked with translating actions from HostAgent and AppAgent into interactions with the application and its UI controls. It's essential that the targeted controls are compatible with the Windows **UI Automation** or **Win32** API.
- <b>Application Automator 🎮</b>, is tasked with translating actions from HostAgent and AppAgent into interactions with the application and through UI controls, native APIs or AI tools. Check out more details [here](./automator/overview.md).

Both agents leverage the multi-modal capabilities of Visual Language Model (VLM) to comprehend the application UI and fulfill the user's request. For more details, please consult our [technical report](https://arxiv.org/abs/2402.07939).
<h1 align="center">
Expand Down
40 changes: 30 additions & 10 deletions ufo/automator/puppeteer.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def create_command(
:param command_name: The command name.
:param params: The arguments for the command.
"""
receiver = self.receiver_manager.get_receiver(command_name)
receiver = self.receiver_manager.get_receiver_from_command_name(command_name)
command = receiver.command_registry.get(command_name.lower(), None)

if receiver is None:
Expand All @@ -56,7 +56,7 @@ def get_command_types(self, command_name: str) -> str:
:param command_name: The command name.
:return: The command types.
"""
receiver = self.receiver_manager.get_receiver(command_name)
receiver = self.receiver_manager.get_receiver_from_command_name(command_name)

return receiver.type_name

Expand All @@ -73,7 +73,7 @@ def execute_command(
command = self.create_command(command_name, params, *args, **kwargs)
return command.execute()

def execute_all_commands(self) -> List:
def execute_all_commands(self) -> List[Any]:
"""
Execute all the commands in the command queue.
:return: The execution results.
Expand All @@ -83,6 +83,8 @@ def execute_all_commands(self) -> List:
command = self.command_queue.popleft()
results.append(command.execute())

return results

def add_command(
self, command_name: str, params: Dict[str, Any], *args, **kwargs
) -> None:
Expand Down Expand Up @@ -182,11 +184,11 @@ def create_ui_control_receiver(
:param control: The control element.
:return: The UI controller receiver.
"""
factory: ReceiverFactory = self._receiver_factory_registry.get("UIControl").get(
factory: ReceiverFactory = self.receiver_factory_registry.get("UIControl").get(
"factory"
)
self.ui_control_receiver = factory.create_receiver(control, application)
self._receiver_list.append(self.ui_control_receiver)
self.receiver_list.append(self.ui_control_receiver)
self._update_receiver_registry()

return self.ui_control_receiver
Expand All @@ -197,15 +199,15 @@ def create_api_receiver(self, app_root_name: str, process_name: str) -> None:
:param app_root_name: The app root name.
:param process_name: The process name.
"""
for receiver_factory_dict in self._receiver_factory_registry.values():
for receiver_factory_dict in self.receiver_factory_registry.values():

# Check if the receiver is API
if receiver_factory_dict.get("is_api"):
receiver = receiver_factory_dict.get("factory").create_receiver(
app_root_name, process_name
)
if receiver is not None:
self._receiver_list.append(receiver)
self.receiver_list.append(receiver)

self._update_receiver_registry()

Expand All @@ -214,11 +216,11 @@ def _update_receiver_registry(self) -> None:
Update the receiver registry. A receiver registry is a dictionary that maps the command name to the receiver.
"""

for receiver in self._receiver_list:
for receiver in self.receiver_list:
if receiver is not None:
self.receiver_registry.update(receiver.self_command_mapping())

def get_receiver(self, command_name: str) -> ReceiverBasic:
def get_receiver_from_command_name(self, command_name: str) -> ReceiverBasic:
"""
Get the receiver from the command name.
:param command_name: The command name.
Expand All @@ -229,13 +231,31 @@ def get_receiver(self, command_name: str) -> ReceiverBasic:
raise ValueError(f"Receiver for command {command_name} is not found.")
return receiver

@property
def receiver_list(self) -> List[ReceiverBasic]:
"""
Get the receiver list.
:return: The receiver list.
"""
return self._receiver_list

@property
def receiver_factory_registry(
self,
) -> Dict[str, Dict[str, Union[str, ReceiverFactory]]]:
"""
Get the receiver factory registry.
:return: The receiver factory registry.
"""
return self._receiver_factory_registry

@property
def com_receiver(self) -> WinCOMReceiverBasic:
"""
Get the COM receiver.
:return: The COM receiver.
"""
for receiver in self._receiver_list:
for receiver in self.receiver_list:
if issubclass(receiver.__class__, WinCOMReceiverBasic):
return receiver

Expand Down
2 changes: 1 addition & 1 deletion ufo/config/config_dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ APP_API_PROMPT_ADDRESS: {
"EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml",
"msedge.exe": "ufo/prompts/apps/web/api.yaml",
"chrome.exe": "ufo/prompts/apps/web/api.yaml"
} # The prompt for the app API
} # The prompt address for the app API. The key is the app program name, and the value is the prompt address.

WORD_API_PROMPT: "ufo/prompts/apps/word/api.yaml" # The prompt for the word API
EXCEL_API_PROMPT: "ufo/prompts/apps/excel/api.yaml" # The prompt for the word API
Expand Down

0 comments on commit dd762d2

Please sign in to comment.