diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index c282e9a1..508be8a6 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -9,6 +9,9 @@ When you submit a pull request, a CLA-bot will automatically determine whether y to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA. +## note +You should sunmit your pull request to the `pre-release` branch, not the `main` branch. + This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. \ No newline at end of file diff --git a/README.md b/README.md index 7aaeb4fb..b45863c7 100644 --- a/README.md +++ b/README.md @@ -49,7 +49,7 @@ Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend th 3. **Extended Application Interaction:** UFO now goes beyond UI controls, allowing interaction with your application through keyboard inputs and native APIs! Presently, we support Word ([examples](/ufo/prompts/apps/word/api.yaml)), with more to come soon. Customize and build your own interactions. 4. **Control Filtering:** Streamline LLM's action process by using control filters to remove irrelevant control items. Enable them in [config_dev.yaml](/ufo/config/config_dev.yaml) under the `control filtering` section at the bottom. - 📅 2024-03-25: **New Release for v0.0.1!** Check out our exciting new features. - 1. We now support creating your help documents for each Windows application to become an app expert. Check the [README](https://microsoft.github.io/UFO/creating_app_agent/help_document_provision/) for more details! + 1. We now support creating your help documents for each Windows application to become an app expert. Check the [documentation](https://microsoft.github.io/UFO/creating_app_agent/help_document_provision/) for more details! 2. UFO now supports RAG from offline documents and online Bing search. 3. You can save the task completion trajectory into its memory for UFO's reference, improving its future success rate! 4. You can customize different GPT models for AppAgent and ActAgent. Text-only models (e.g., GPT-4) are now supported! @@ -141,7 +141,7 @@ UFO also supports other LLMs and advanced configurations, such as customize your If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the `ufo/config/config.yaml` file. We provide the following options for RAG to enhance UFO's capabilities: -- [Offline Help Document](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_help_document/)* Enable UFO to retrieve information from offline help documents. +- [Offline Help Document](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_help_document/) Enable UFO to retrieve information from offline help documents. - [Online Bing Search Engine](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_bing_search/): Enhance UFO's capabilities by utilizing the most up-to-date online search results. - [Self-Experience](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/experience_learning/): Save task completion trajectories into UFO's memory for future reference. - [User-Demonstration](https://microsoft.github.io/UFO/advanced_usage/reinforce_appagent/learning_from_demonstration/): Boost UFO's capabilities through user demonstration. diff --git a/documents/docs/about/CONTRIBUTING.md b/documents/docs/about/CONTRIBUTING.md index c282e9a1..3ac034b3 100644 --- a/documents/docs/about/CONTRIBUTING.md +++ b/documents/docs/about/CONTRIBUTING.md @@ -9,6 +9,9 @@ When you submit a pull request, a CLA-bot will automatically determine whether y to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA. +!!! note + You should sunmit your pull request to the `pre-release` branch, not the `main` branch. + This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. \ No newline at end of file diff --git a/documents/docs/index.md b/documents/docs/index.md index 2ca81f32..bac79968 100644 --- a/documents/docs/index.md +++ b/documents/docs/index.md @@ -24,7 +24,7 @@ - AppAgent 👾, responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. -- Control Interaction 🎮, is tasked with translating actions from HostAgent and AppAgent into interactions with the application and its UI controls. It's essential that the targeted controls are compatible with the Windows **UI Automation** or **Win32** API. +- Application Automator 🎮, is tasked with translating actions from HostAgent and AppAgent into interactions with the application and through UI controls, native APIs or AI tools. Check out more details [here](./automator/overview.md). Both agents leverage the multi-modal capabilities of Visual Language Model (VLM) to comprehend the application UI and fulfill the user's request. For more details, please consult our [technical report](https://arxiv.org/abs/2402.07939).

diff --git a/ufo/automator/puppeteer.py b/ufo/automator/puppeteer.py index f0598329..d3163924 100644 --- a/ufo/automator/puppeteer.py +++ b/ufo/automator/puppeteer.py @@ -39,7 +39,7 @@ def create_command( :param command_name: The command name. :param params: The arguments for the command. """ - receiver = self.receiver_manager.get_receiver(command_name) + receiver = self.receiver_manager.get_receiver_from_command_name(command_name) command = receiver.command_registry.get(command_name.lower(), None) if receiver is None: @@ -56,7 +56,7 @@ def get_command_types(self, command_name: str) -> str: :param command_name: The command name. :return: The command types. """ - receiver = self.receiver_manager.get_receiver(command_name) + receiver = self.receiver_manager.get_receiver_from_command_name(command_name) return receiver.type_name @@ -73,7 +73,7 @@ def execute_command( command = self.create_command(command_name, params, *args, **kwargs) return command.execute() - def execute_all_commands(self) -> List: + def execute_all_commands(self) -> List[Any]: """ Execute all the commands in the command queue. :return: The execution results. @@ -83,6 +83,8 @@ def execute_all_commands(self) -> List: command = self.command_queue.popleft() results.append(command.execute()) + return results + def add_command( self, command_name: str, params: Dict[str, Any], *args, **kwargs ) -> None: @@ -182,11 +184,11 @@ def create_ui_control_receiver( :param control: The control element. :return: The UI controller receiver. """ - factory: ReceiverFactory = self._receiver_factory_registry.get("UIControl").get( + factory: ReceiverFactory = self.receiver_factory_registry.get("UIControl").get( "factory" ) self.ui_control_receiver = factory.create_receiver(control, application) - self._receiver_list.append(self.ui_control_receiver) + self.receiver_list.append(self.ui_control_receiver) self._update_receiver_registry() return self.ui_control_receiver @@ -197,7 +199,7 @@ def create_api_receiver(self, app_root_name: str, process_name: str) -> None: :param app_root_name: The app root name. :param process_name: The process name. """ - for receiver_factory_dict in self._receiver_factory_registry.values(): + for receiver_factory_dict in self.receiver_factory_registry.values(): # Check if the receiver is API if receiver_factory_dict.get("is_api"): @@ -205,7 +207,7 @@ def create_api_receiver(self, app_root_name: str, process_name: str) -> None: app_root_name, process_name ) if receiver is not None: - self._receiver_list.append(receiver) + self.receiver_list.append(receiver) self._update_receiver_registry() @@ -214,11 +216,11 @@ def _update_receiver_registry(self) -> None: Update the receiver registry. A receiver registry is a dictionary that maps the command name to the receiver. """ - for receiver in self._receiver_list: + for receiver in self.receiver_list: if receiver is not None: self.receiver_registry.update(receiver.self_command_mapping()) - def get_receiver(self, command_name: str) -> ReceiverBasic: + def get_receiver_from_command_name(self, command_name: str) -> ReceiverBasic: """ Get the receiver from the command name. :param command_name: The command name. @@ -229,13 +231,31 @@ def get_receiver(self, command_name: str) -> ReceiverBasic: raise ValueError(f"Receiver for command {command_name} is not found.") return receiver + @property + def receiver_list(self) -> List[ReceiverBasic]: + """ + Get the receiver list. + :return: The receiver list. + """ + return self._receiver_list + + @property + def receiver_factory_registry( + self, + ) -> Dict[str, Dict[str, Union[str, ReceiverFactory]]]: + """ + Get the receiver factory registry. + :return: The receiver factory registry. + """ + return self._receiver_factory_registry + @property def com_receiver(self) -> WinCOMReceiverBasic: """ Get the COM receiver. :return: The COM receiver. """ - for receiver in self._receiver_list: + for receiver in self.receiver_list: if issubclass(receiver.__class__, WinCOMReceiverBasic): return receiver diff --git a/ufo/config/config_dev.yaml b/ufo/config/config_dev.yaml index ba1d73b3..c3c623ed 100644 --- a/ufo/config/config_dev.yaml +++ b/ufo/config/config_dev.yaml @@ -60,7 +60,7 @@ APP_API_PROMPT_ADDRESS: { "EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml", "msedge.exe": "ufo/prompts/apps/web/api.yaml", "chrome.exe": "ufo/prompts/apps/web/api.yaml" -} # The prompt for the app API +} # The prompt address for the app API. The key is the app program name, and the value is the prompt address. WORD_API_PROMPT: "ufo/prompts/apps/word/api.yaml" # The prompt for the word API EXCEL_API_PROMPT: "ufo/prompts/apps/excel/api.yaml" # The prompt for the word API