Skip to content

Commit

Permalink
Merge pull request microsoft#36 from microsoft/vyokky/dev
Browse files Browse the repository at this point in the history
Vyokky/dev v0.0.1 New release
  • Loading branch information
vyokky authored Mar 25, 2024
2 parents 5b08db3 + 185daa2 commit 89de7df
Show file tree
Hide file tree
Showing 13 changed files with 217 additions and 471 deletions.
101 changes: 80 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,11 @@ Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend th


## 📢 News
- 📅 2024-03-XX: New Release for v0.0.1! Check out our exciting new features:
1. Our UFO framework now support RAG from offline document and online Bing search.
2. We now support creating your help documents for each Windows app to become an app expert. Check XX for more details!
3. UFO now support more LLMs and customized models.
- 📅 2024-03-25: **New Release for v0.0.1!** Check out our exciting new features:
1. We now support creating your help documents for each Windows application to become an app expert. Check the [README](./learner/README.md) for more details!
2. UFO now supports RAG from offline documents and online Bing search.
3. You can save the task completion trajectory into its memory for UFO's reference, improving its future success rate!
4. You can customize different GPT models for AppAgent and ActAgent. Text-only models (e.g., GPT-4) are now supported!
- 📅 2024-02-14: Our [technical report](https://arxiv.org/abs/2402.07939) is online!
- 📅 2024-02-10: UFO is released on GitHub🎈. Happy Chinese New year🐉!

Expand All @@ -45,6 +46,7 @@ Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend th

UFO sightings have garnered attention from various media outlets, including:
- [Microsoft's UFO abducts traditional user interfaces for a smarter Windows experience](https://the-decoder.com/microsofts-ufo-abducts-traditional-user-interfaces-for-a-smarter-windows-experience/)
- [🚀 UFO & GPT-4-V: Sit back and relax, mientras GPT lo hace todo🌌](https://www.linkedin.com/posts/gutierrezfrancois_ai-ufo-microsoft-activity-7176819900399652865-pLoo?utm_source=share&utm_medium=member_desktop)
- [The AI PC - The Future of Computers? - Microsoft UFO](https://www.youtube.com/watch?v=1k4LcffCq3E)
- [下一代Windows系统曝光:基于GPT-4V,Agent跨应用调度,代号UFO](https://www.qbitai.com/2024/02/121048.html)
- [下一代智能版 Windows 要来了?微软推出首个 Windows Agent,命名为 UFO!](https://blog.csdn.net/csdnnews/article/details/136161570)
Expand Down Expand Up @@ -80,26 +82,83 @@ pip install -r requirements.txt
```

### ⚙️ Step 2: Configure the LLMs
Before running UFO, you need to provide your LLM configurations. You can create a config file `ufo/config/config.yaml`, by copying the `ufo/config/config.yaml.template` edited as follows:
Before running UFO, you need to provide your LLM configurations **individully for AppAgent and ActAgent**. You can create your own config file `ufo/config/config.yaml`, by copying the `ufo/config/config.yaml.template` and editing config for **APP_AGENT** and **ACTION_AGENT** as follows:

#### OpenAI
```
API_TYPE: "openai"
OPENAI_API_BASE: "https://api.openai.com/v1/chat/completions" # The base URL for the OpenAI API
OPENAI_API_KEY: "YOUR_API_KEY" # Set the value to the openai key for the llm model
OPENAI_API_MODEL: "GPTV_MODEL_NAME" # The only OpenAI model by now that accepts visual input
```bash
VISUAL_MODE: True, # Whether to use the visual mode
API_TYPE: "openai" , # The API type, "openai" for the OpenAI API.
API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint.
API_KEY: "sk-", # The OpenAI API key, begin with sk-
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4-vision-preview", # The only OpenAI model by now that accepts visual input
```

#### Azure OpenAI (AOAI)
```bash
VISUAL_MODE: True, # Whether to use the visual mode
API_TYPE: "aoai" , # The API type, "aoai" for the Azure OpenAI.
API_BASE: "YOUR_ENDPOINT", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
API_KEY: "YOUR_KEY", # The aoai API key
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4-vision-preview", # The only OpenAI model by now that accepts visual input
API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
```
API_TYPE: "aoai"
OPENAI_API_BASE: "YOUR_ENDPOINT" # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-id}/chat/completions?api-version={api-version}
OPENAI_API_KEY: "YOUR_API_KEY" # Set the value to the openai key for the llm model
OPENAI_API_MODEL: "GPTV_MODEL_NAME" # The only OpenAI model by now that accepts visual input
You can also non-visial model (e.g., GPT-4) for each agent, by setting `VISUAL_MODE: True` and proper `API_MODEL` (openai) and `API_DEPLOYMENT_ID` (aoai). You can also optionally set an backup LLM engine in the field of `BACKUP_AGENT` if the above engines failed during the inference.


#### Non-Visual Model Configuration
You can utilize non-visual models (e.g., GPT-4) for each agent by configuring the following settings in the config.yaml file:

- ```VISUAL_MODE: False # To enable non-visual mode.```
- Specify the appropriate `API_MODEL` (OpenAI) and `API_DEPLOYMENT_ID` (AOAI) for each agent.

Optionally, you can set a backup language model (LLM) engine in the `BACKUP_AGENT` field to handle cases where the primary engines fail during inference. Ensure you configure these settings accurately to leverage non-visual models effectively.


### 📔 Step 3: Additional Setting for RAG (optional).
If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the `ufo/config/config.yaml` file.

#### RAG from Offline Help Document
Before enabling this function, you need to create an offline indexer for your help document. Please refer to the [README](./learner/README.md) to learn how to create an offline vectored database for retrieval. You can enable this function by setting the following configuration:
```bash
## RAG Configuration for the offline docs
RAG_OFFLINE_DOCS: True # Whether to use the offline RAG.
RAG_OFFLINE_DOCS_RETRIEVED_TOPK: 1 # The topk for the offline retrieved documents
```
Adjust `RAG_OFFLINE_DOCS_RETRIEVED_TOPK` to optimize performance.


#### RAG from Online Bing Search Engine
Enhance UFO's ability by utilizing the most up-to-date online search results! To use this function, you need to obtain a Bing search API key. Activate this feature by setting the following configuration:
```bash
## RAG Configuration for the Bing search
BING_API_KEY: "YOUR_BING_SEARCH_API_KEY" # The Bing search API key
RAG_ONLINE_SEARCH: True # Whether to use the online search for the RAG.
RAG_ONLINE_SEARCH_TOPK: 5 # The topk for the online search
RAG_ONLINE_RETRIEVED_TOPK: 1 # The topk for the online retrieved documents
```
Adjust `RAG_ONLINE_SEARCH_TOPK` and `RAG_ONLINE_RETRIEVED_TOPK` to get better performance.


#### RAG from Self-Demonstration
Save task completion trajectories into UFO's memory for future reference. This can improve its future success rates based on its previous experiences!

After completing a task, you'll see the following message:
```
Would you like to save the current conversation flow for future reference by the agent?
[Y] for yes, any other key for no.
```
Press `Y` to save it into its memory and enable memory retrieval via the following configuration:
```bash
## RAG Configuration for experience
RAG_EXPERIENCE: True # Whether to use the RAG from its self-experience.
RAG_EXPERIENCE_RETRIEVED_TOPK: 5 # The topk for the offline retrieved documents
```



### 🎉 Step 3: Start UFO
### 🎉 Step 4: Start UFO

#### ⌨️ You can execute the following on your Windows command Line (CLI):

Expand All @@ -125,7 +184,7 @@ Please enter your request to be completed🛸:
- The GPT-V accepts screenshots of your desktop and application GUI as input. Please ensure that no sensitive or confidential information is visible or captured during the execution process. For further information, refer to [DISCLAIMER.md](./DISCLAIMER.md).


### Step 4 🎥: Execution Logs
### Step 5 🎥: Execution Logs

You can find the screenshots taken and request & response logs in the following folder:
```
Expand Down Expand Up @@ -184,11 +243,11 @@ If you use UFO in your research, please cite our paper:
```

## 📝 Todo List
- ⏩ Documentation.
- ⏩ Support local host GUI interaction model.
- Support more control using Win32 API.
- ⏩ RAG enhanced UFO.
- Chatbox GUI for UFO.
- [x] RAG enhanced UFO.
- [ ] Documentation.
- [ ] Support local host GUI interaction model.
- [ ] Support more control using Win32 API.
- [ ] Chatbox GUI for UFO.



Expand Down
2 changes: 0 additions & 2 deletions record_processor/__init__.py

This file was deleted.

8 changes: 0 additions & 8 deletions record_processor/__main__.py

This file was deleted.

42 changes: 0 additions & 42 deletions record_processor/parser/behavior_record.py

This file was deleted.

171 changes: 0 additions & 171 deletions record_processor/parser/psr_record_parser.py

This file was deleted.

Loading

0 comments on commit 89de7df

Please sign in to comment.