Lists (1)
Sort Name ascending (A-Z)
Stars
A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.
Make websites accessible for AI agents
A minimalistic AI-powered search engine that helps you find information on the internet. Powered by Vercel AI SDK! Search with models like Grok 2.0.
llamaindex node parsing for images
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Inference and training library for high-quality TTS models.
A simple screen parsing tool towards pure vision based GUI agent
An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI
Agent S: an open agentic framework that uses computers like a human
A high-throughput and memory-efficient inference and serving engine for LLMs
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
🚀 Automatically deploy your project to GitHub Pages using GitHub Actions. This action can be configured to push your production-ready code into any branch you'd like.
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
OpenUI let's you describe UI using your imagination, then see it rendered live.
Awesome list of 300+ agentic AI resources
A UI-Focused Agent for Windows OS Interaction.
This is an operating system independent implementation of iOS device features. You can run UI tests, launch or kill apps, install apps etc. with it.
InterfaceAgent: a versatile framework designed to create system and interface agents capable of managing mobile and desktop applications and features.
Next.js Chrome Extension Starter example application that demonstrates how to build a Chrome extension using Next.js. It provides a foundation for developing Chrome extensions with Next.js, React a…
Small "Pin To TaskBar" exe for Command Line, tested on Windows 10 Version 20H2 (Win10 19042.964). Reverse engineering of syspin.exe "PE injection into Progman" method.
A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.