Skip to content

Commit

Permalink
README add captions
Browse files Browse the repository at this point in the history
  • Loading branch information
mkagenius committed Dec 17, 2024
1 parent 1cd265f commit c2e2c71
Showing 1 changed file with 28 additions and 13 deletions.
41 changes: 28 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,25 @@

### A framework to enable autonomous android and computer use using any LLM (local or remote)

## Demos

![](https://github.com/user-attachments/assets/7cdbebb7-0ac4-4c20-8d67-f3c07cd4ab01)
![click3](https://github.com/user-attachments/assets/103afd59-ae29-45d2-9d77-75375b1538a0)

## Demos

![](https://github.com/user-attachments/assets/eb5dc968-206b-422d-aa3c-20c48bac3fed)
### create a draft gmail to [email protected] and ask him if he is free for lunch on coming saturday at 1PM. Congratulate on the baby - write one para.
https://github.com/user-attachments/assets/7cdbebb7-0ac4-4c20-8d67-f3c07cd4ab01

### Can you open the browser at https://www.google.com/maps/ and answer the corresponding task: Find bus stops in Alanson, MI
https://github.com/user-attachments/assets/eb5dc968-206b-422d-aa3c-20c48bac3fed

![](https://github.com/user-attachments/assets/68fc3475-2299-4254-8673-3123356177b5)
### start a 3+2 game on lichess
https://github.com/user-attachments/assets/68fc3475-2299-4254-8673-3123356177b5


Currently supporting local models via Ollama (Llama 3.2-vision), Gemini, GPT 4o. The current code is highly experimental and will evolve in future commits. Please use at your own risk.

The best result currently comes from using GPT 4o as planner and Gemini Pro or Flash as finder.
The best result currently comes from using GPT 4o/4o-mini as planner and Gemini Pro/Flash as finder.

![model recommendations](https://github.com/user-attachments/assets/355865f9-704b-483c-a23b-5dc9be54aeda)

#### How to install

Expand All @@ -42,10 +47,16 @@ pip install -r requirements.txt

#### How to use

Put your model specific settings in config/models.yaml and export the keys specified in the yaml file.
Put your model specific settings in config/models.yaml and export the keys specified in the yaml file.

## As CLI tool

Install the tool

```sh
pip install <repo-whl>
```

```sh
./click3 run open google.com in browser
```
Expand All @@ -65,14 +76,18 @@ You will be prompted to choose the planner and finder models and provide any nec

To execute a task, use the `run` command. The basic usage is:

```sh
pip install <repo-whl>
```

```sh
./click3 run <task-prompt>
```

#### Options

- `--platform`: Specifies the platform to use, either `android` or `osx`. Default is `android`.

```sh
python main.py run "example task" --platform=osx
```
Expand Down Expand Up @@ -111,7 +126,7 @@ This endpoint executes a task based on the provided task prompt, platform, plann
- `finder_model` (string, optional): The finder model to be used for finding elements to interact with. Default is "gemini". Supported models: "gemini", "openai", "ollama".

#### Response:
- `200 OK`:
- `200 OK`:
- `result` (object): The result of the task execution.
- `400 Bad Request`:
- `detail` (string): Description of why the request is invalid (e.g., unsupported platform, unsupported planner model, unsupported finder model).
Expand All @@ -121,7 +136,7 @@ This endpoint executes a task based on the provided task prompt, platform, plann
#### Example Request:
```bash
curl -X POST "http://localhost:8000/execute" -H "Content-Type: application/json" -d '{
"task_prompt": "Take a screenshot",
"task_prompt": "Open uber app",
"platform": "android",
"planner_model": "gemini",
"finder_model": "openai"
Expand All @@ -140,7 +155,7 @@ curl -X POST "http://localhost:8000/execute" -H "Content-Type: application/json"
}
```

#### Prerequisites
#### Prerequisites

This project needs adb to be installed on your local machine where the code is being executed.

Expand All @@ -154,7 +169,7 @@ Contributions are welcome! Please open an issue or submit a pull request.

#### Things to do

Three components-
Three components-

1. Planner
2. Finder
Expand Down Expand Up @@ -182,4 +197,4 @@ pre-commit run --all-files

## License

This project is licensed under the MIT License. See the LICENSE file for details.
This project is licensed under the MIT License. See the LICENSE file for details.

0 comments on commit c2e2c71

Please sign in to comment.