Name	Name	Last commit message	Last commit date
parent directory ..
gradio	gradio
omnibox	omnibox
omniparserserver	omniparserserver
readme.md	readme.md

OmniTool

Control a Windows 11 VM with OmniParser + your vision model of choice.

Highlights:

OmniParser V2 is 60% faster than V1 and now understands a wide variety of OS, app and inside app icons!
OmniBox uses 50% less disk space than other Windows VMs for agent testing, whilst providing the same computer use API
OmniTool supports out of the box the following vision models - OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use

Overview

There are three components:

	omniparserserver	FastAPI server running OmniParser V2.
	omnibox	A Windows 11 VM running in a Docker container.
	gradio	UI to provide commands and watch reasoning + execution on OmniBox

Showcase Video

OmniParser V2	Watch Video
OmniTool	Watch Video

Notes:

Though OmniParser V2 can run on a CPU, we have separated this out if you want to run it fast on a GPU machine
The OmniBox Windows 11 VM docker is dependent on KVM so can only run quickly on Windows and Linux. This can run on a CPU machine (doesn't need GPU).
The Gradio UI can also run on a CPU machine. We suggest running omnibox and gradio on the same CPU machine and omniparserserver on a GPU server.

Setup

omniparserserver:

a. If you already have a conda environment for OmniParser, you can use that. Else follow the following steps to create one

b. Ensure conda is installed with conda --version or install from the Anaconda website

c. Navigate to the root of the repo with cd OmniParser

d. Create a conda python environment with conda create -n "omni" python==3.12

e. Set the python environment to be used with conda activate omni

f. Install the dependencies with pip install -r requirements.txt

g. Continue from here if you already had the conda environment.

h. Ensure you have the V2 weights downloaded in weights folder (ensure caption weights folder is called icon_caption_florence). If not download them with:
```
rm -rf weights/icon_detect weights/icon_caption weights/icon_caption_florence 
for folder in icon_caption icon_detect; do huggingface-cli download microsoft/OmniParser-v2.0 --local-dir weights --repo-type model --include "$folder/*"; done
mv weights/icon_caption weights/icon_caption_florence
```
h. Navigate to the server directory with cd OmniParser/omnitool/omniparserserver

i. Start the server with python -m omniparserserver
omnibox:

a. Install Docker Desktop

b. Visit Microsoft Evaluation Center, accept the Terms of Service, and download a Windows 11 Enterprise Evaluation (90-day trial, English, United States) ISO file [~6GB]. Rename the file to custom.iso and copy it to the directory OmniParser/omnitool/omnibox/vm/win11iso

c. Navigate to vm management script directory withcd OmniParser/omnitool/omnibox/scripts

d. Build the docker container [400MB] and install the ISO to a storage folder [20GB] with ./manage_vm.sh create

e. After creating the first time it will store a save of the VM state in vm/win11storage. You can then manage the VM with ./manage_vm.sh start and ./manage_vm.sh stop. To delete the VM, use ./manage_vm.sh delete and delete the OmniParser/omnitool/omnibox/vm/win11storage directory.
gradio:

a. Navigate to the gradio directory with cd OmniParser/omnitool/gradio

b. Ensure you have activated the conda python environment with conda activate omni

c. Start the server with python app.py --windows_host_url localhost:8006 --omniparser_server_url localhost:8000

d. Open the URL in the terminal output, set your API Key and start playing with the AI agent!

Risks and Mitigations

To align with the Microsoft AI principles and Responsible AI practices, we conduct risk mitigation by training the icon caption model with Responsible AI data, which helps the model avoid inferring sensitive attributes (e.g.race, religion etc.) of the individuals which happen to be in icon images as much as possible. At the same time, we encourage user to apply OmniParser only for screenshot that does not contain harmful/violent content. For the OmniTool, we conduct threat model analysis using Microsoft Threat Modeling Tool. We advise human to stay in the loop in order to minimize risk.

Acknowledgment

Kudos to the amazing resources that are invaluable in the development of our code: Claude Computer Use, OS World, Windows Agent Arena, and computer_use_ootb. We are grateful for helpful suggestions and feedbacks provided by Francesco Bonacci, Jianwei Yang, Dillon DuPont, Yue Wu, Anh Nguyen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

omnitool

omnitool

readme.md

OmniTool

Highlights:

Overview

Showcase Video

Notes:

Setup

Risks and Mitigations

Acknowledgment

Files

omnitool

Directory actions

More options

Directory actions

More options

Latest commit

History

omnitool

Folders and files

parent directory

readme.md

OmniTool

Highlights:

Overview

Showcase Video

Notes:

Setup

Risks and Mitigations

Acknowledgment