Skip to content

A framework to enable autonomous android and computer use using any LLM (local or remote)

License

Notifications You must be signed in to change notification settings

CiFuego/clickclickclick

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ClickClickClick

A framework to enable autonomous android and computer use using any LLM (local or remote)

click3

Demos

create a draft gmail and ask him if he is free for lunch on coming saturday at 1PM. Congratulate on the baby - write one para.

draft.gmail.to.rob.ask.for.lunch.n.congratulate.for.baby.mp4

Can you open the browser at https://www.google.com/maps/ and answer the corresponding task: Find bus stops in Alanson, MI

Browse.Google.Maps.Find.Bus.Stops.mp4

start a 3+2 game on lichess

start.a.3+2.game.in.lichess.mp4

Currently supporting local models via Ollama (Llama 3.2-vision), Gemini, GPT 4o. The current code is highly experimental and will evolve in future commits. Please use at your own risk.

The best result currently comes from using GPT 4o/4o-mini as planner and Gemini Pro/Flash as finder.

model recommendations

How to install

Clone the repository and navigate into the project directory:

git clone https://github.com/BandarLabs/clickclickclick
cd clickclickclick

It is recommended to create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the dependencies:

pip install -r requirements.txt

How to use

Put your model specific settings in config/models.yaml and export the keys specified in the yaml file.

As CLI tool

Install the tool

(Ensure OPENAI_API_KEY and GEMINI_API_KEY API keys in the environment)

pip install <repo-tar>
click3 run open google.com in browser

Setup

Before running any tasks, you need to configure your planner and finder models using the setup command:

python main.py setup

You will be prompted to choose the planner and finder models and provide any necessary API keys.

Running Tasks

To execute a task, use the run command. The basic usage is:

python main.py run <task-prompt>

Options

  • --platform: Specifies the platform to use, either android or osx. Default is android.

    python main.py run "example task" --platform=osx
  • --planner-model: Specifies the planner model to use, either openai, gemini, or ollama. Default is openai.

    python main.py run "example task" --planner-model=gemini
  • --finder-model: Specifies the finder model to use, either openai, gemini, or ollama. Default is gemini.

    python main.py run "example task" --finder-model=ollama

Example

A full example command might look like:

python main.py run "Open Google news" --platform=android --planner-model=openai --finder-model=gemini

Use as an API

To run the app

uvicorn api:app

POST /execute

Description:

This endpoint executes a task based on the provided task prompt, platform, planner model, and finder model.

Request Body:

  • task_prompt (string): The prompt for the task that needs to be executed.
  • platform (string, optional): The platform on which the task is to be executed. Default is "android". Supported platforms: "android", "osx".
  • planner_model (string, optional): The planner model to be used for planning the task. Default is "openai". Supported models: "openai", "gemini", "ollama".
  • finder_model (string, optional): The finder model to be used for finding elements to interact with. Default is "gemini". Supported models: "gemini", "openai", "ollama".

Response:

  • 200 OK:
    • result (object): The result of the task execution.
  • 400 Bad Request:
    • detail (string): Description of why the request is invalid (e.g., unsupported platform, unsupported planner model, unsupported finder model).
  • 500 Internal Server Error:
    • detail (string): Description of the error that occurred during task execution.

Example Request:

curl -X POST "http://localhost:8000/execute" -H "Content-Type: application/json" -d '{
  "task_prompt": "Open uber app",
  "platform": "android",
  "planner_model": "openai",
  "finder_model": "gemini"
}'

Example Response:

{"result":true}

Prerequisites

  1. This project needs adb to be installed on your local machine where the code is being executed.
  2. Enable USB debugging on the android phone.
  3. Python >= 3.11

How to contribute

Contributions are welcome! Please begin by opening an issue to discuss your ideas. Once the issue is reviewed and assigned, you can proceed with submitting a pull request.

Things to do

[ ] Enable local models via Ollama on Android [ ] Make computer use fully functional

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

A framework to enable autonomous android and computer use using any LLM (local or remote)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%