Skip to content

Commit

Permalink
Update Magentic-one readme with images (microsoft#3958)
Browse files Browse the repository at this point in the history
  • Loading branch information
victordibia authored Oct 25, 2024
1 parent 388e4d9 commit 0756ebd
Show file tree
Hide file tree
Showing 5 changed files with 46 additions and 46 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

This file was deleted.

79 changes: 38 additions & 41 deletions python/packages/autogen-magentic-one/readme.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
# Magentic-One
Magentic-One is a generalist multi-agent softbot that utilizes a combination of five agents, including LLM and tool-based agents, to tackle intricate tasks. For example, it can be used to solve general tasks that involve multi-step planning and action in the real-world.

> *Example*: Suppose a user requests to conduct a survey of AI safety papers published in the last month and create a concise presentation on the findings. Magentic-One will use the following process to handle this task. The orchestrator agent will break down the task into subtasks and assign them to the appropriate agents. Such as the web surfer agent to search for AI safety papers, the file surfer agent to extract information from the papers, the coder agent to create the presentation, and the computer terminal agent to execute the code. The orchestrator agent will coordinate the agents, monitor progress, and ensure the task is completed successfully.
Magentic-One is a generalist multi-agent softbot that utilizes a combination of five agents, including LLM and tool-based agents, to tackle intricate tasks. For example, it can be used to solve general tasks that involve multi-step planning and action in the real-world.

![](./imgs/autogen-magentic-one-example.png)

> _Example_: Suppose a user requests the following: _Can you rewrite the readme of the autogen GitHub repository to be more clear_. Magentic-One will use the following process to handle this task. The Orchestrator agent will break down the task into subtasks and assign them to the appropriate agents. In this case, the WebSurfer will navigate to GiHub, search for the autogen repository, and extract the readme file. Next the Coder agent will rewrite the readme file for clarity and return the updated content to the Orchestrator. At each point, the Orchestrator will monitor progress via a ledger, and terminate when the task is completed successfully.
## Architecture

<center>
<img src="./imgs/autogen-magentic-one-landing.png" alt="drawing" style="width:350px;"/>
</center>

<!-- <center>
<img src="./imgs/autgen" alt="drawing" style="width:350px;"/>
</center> -->

![](./imgs/autogen-magentic-one-agents.png)

Magentic-One uses agents with the following personas and capabilities:

Expand All @@ -27,10 +28,10 @@ Magentic-One uses agents with the following personas and capabilities:

We created Magentic-One with one agent of each type because their combined abilities help tackle tough benchmarks. By splitting tasks among different agents, we keep the code simple and modular, like in object-oriented programming. This also makes each agent's job easier since they only need to focus on specific tasks. For example, the websurfer agent only needs to navigate webpages and doesn't worry about writing code, making the team more efficient and effective.


### Planning and Tracking Task Progress

<center>
<img src="./imgs/autogen-magentic-one-arch.png" alt="drawing" style="width:600px;"/>
<img src="./imgs/autogen-magentic-one-arch.png" alt="drawing" style=""/>
</center>

The figure illustrates the workflow of an orchestrator managing a multi-agent setup, starting with an initial prompt or task. The orchestrator creates or updates a ledger with gathered information, including verified facts, facts to look up, derived facts, and educated guesses. Using this ledger, a plan is derived, which consists of a sequence of steps and task assignments for the agents. Before execution, the orchestrator clears the agents' contexts to ensure they start fresh. The orchestrator then evaluates if the request is fully satisfied. If so, it reports the final answer or an educated guess.
Expand All @@ -39,21 +40,19 @@ If the request is not fully satisfied, the orchestrator assesses whether the wor

Note that many parameters such as terminal logic and maximum number of stalled iterations are configurable. Also note that the orchestrator cannot instantiate new agents. This is possible but not implemented in Magentic-One.


## Table of Definitions:

| Term | Definition |
|---------------|-------------------------------------------------|
| Agent | A component that can (autonomously) act based on observations. Different agents may have different functions and actions. |
| Planning | The process of determining actions to achieve goals, performed by the Orchestrator agent in Magentic-One. |
| Ledger | A record-keeping component used by the Orchestrator agent to track the progress and manage subgoals in Magentic-One. |
| Stateful Tools | Tools that maintain state or data, such as the web browser and markdown-based file browser used by Magentic-One. |
| Tools | Resources used by Magentic-One for various purposes, including stateful and stateless tools. |
| Stateless Tools | Tools that do not maintain state or data, like the commandline executor used by Magentic-One. |


| Term | Definition |
| --------------- | ------------------------------------------------------------------------------------------------------------------------- |
| Agent | A component that can (autonomously) act based on observations. Different agents may have different functions and actions. |
| Planning | The process of determining actions to achieve goals, performed by the Orchestrator agent in Magentic-One. |
| Ledger | A record-keeping component used by the Orchestrator agent to track the progress and manage subgoals in Magentic-One. |
| Stateful Tools | Tools that maintain state or data, such as the web browser and markdown-based file browser used by Magentic-One. |
| Tools | Resources used by Magentic-One for various purposes, including stateful and stateless tools. |
| Stateless Tools | Tools that do not maintain state or data, like the commandline executor used by Magentic-One. |

## Capabilities and Performance

### Capabilities

- Planning: The Orchestrator agent in Magentic-One excels at performing planning tasks. Planning involves determining actions to achieve goals. The Orchestrator agent breaks down complex tasks into smaller subtasks and assigns them to the appropriate agents.
Expand All @@ -76,7 +75,6 @@ Note that many parameters such as terminal logic and maximum number of stalled i

- Web Interaction: The Web Surfer agent in Magentic-One is proficient in web-related tasks. It can browse the internet, retrieve information from websites, and interact with web-based applications. This capability allows Magentic-One to handle interactive web pages, forms, and other web elements.


### What Magentic-One Cannot Do

- **Video Scrubbing:** The agents are unable to navigate and process video content.
Expand All @@ -87,63 +85,64 @@ Note that many parameters such as terminal logic and maximum number of stalled i
- **Limited LLM Capacity:** The agents' abilities are constrained by the limitations of the underlying language model.
- **Web Surfer Limitations:** The web surfer agent may struggle with certain types of web pages, such as those requiring complex interactions or extensive JavaScript handling.


### Safety and Risks

**Code Execution:**

- **Risks:** Code execution carries inherent risks as it happens in the environment where the agents run using the command line executor. This means that the agents can execute arbitrary Python code.
- **Mitigation:** Users are advised to run the system in isolated environments, such as Docker containers, to mitigate the risks associated with executing arbitrary code.

**Web Browsing:**

- **Capabilities:** The web surfer agent can operate on most websites, including performing tasks like booking flights.
- **Risks:** Since the requests are sent online using GPT-4-based models, there are potential privacy and security concerns. It is crucial not to provide sensitive information such as keys or credit card data to the agents.

**Safeguards:**

- **Guardrails from LLM:** The agents inherit the guardrails from the underlying language model (e.g., GPT-4). This means they will refuse to generate toxic or stereotyping content, providing a layer of protection against generating harmful outputs.
- **Limitations:** The agents' behavior is directly influenced by the capabilities and limitations of the underlying LLM. Consequently, any lack of guardrails in the language model will also affect the behavior of the agents.

**General Recommendations:**

- Always use isolated or controlled environments for running the agents to prevent unauthorized or harmful code execution.
- Avoid sharing sensitive information with the agents to protect your privacy and security.
- Regularly update and review the underlying LLM and system configurations to ensure they adhere to the latest safety and security standards.


### Performance
Magentic-One currently achieves the following performance on complex agent benchmarks.

Magentic-One currently achieves the following performance on complex agent benchmarks.

#### GAIA

GAIA is a benchmark from Meta that contains complex tasks that require multi-step reasoning and tool use. For example,

GAIA is a benchmark from Meta that contains complex tasks that require multi-step reasoning and tool use. For example,

> *Example*: If Eliud Kipchoge could maintain his record-making marathon pace indefinitely, how many thousand hours would it take him to run the distance between the Earth and the Moon its closest approach? Please use the minimum perigee value on the Wikipedia page for the Moon when carrying out your calculation. Round your result to the nearest 1000 hours and do not use any comma separators if necessary.
> _Example_: If Eliud Kipchoge could maintain his record-making marathon pace indefinitely, how many thousand hours would it take him to run the distance between the Earth and the Moon its closest approach? Please use the minimum perigee value on the Wikipedia page for the Moon when carrying out your calculation. Round your result to the nearest 1000 hours and do not use any comma separators if necessary.
In order to solve this task, the orchestrator begins by outlining the steps needed to solve the task of calculating how many thousand hours it would take Eliud Kipchoge to run the distance between the Earth and the Moon at its closest approach. The orchestrator instructs the web surfer agent to gather Eliud Kipchoge's marathon world record time (2:01:39) and the minimum perigee distance of the Moon from Wikipedia (356,400 kilometers).

Next, the orchestrator assigns the assistant agent to use this data to perform the necessary calculations. The assistant converts Kipchoge's marathon time to hours (2.0275 hours) and calculates his speed (approximately 20.81 km/h). It then calculates the total time to run the distance to the Moon (17,130.13 hours), rounding it to the nearest thousand hours, resulting in approximately 17,000 thousand hours. The orchestrator then confirms and reports this final result.

Here is the performance of Magentic-One on a GAIA development set.

| Level | Task Completion Rate* |
|---------|-----------------------|
| Level 1 | 55% (29/53) |
| Level 2 | 34% (29/86) |
| Level 3 | 12% (3/26) |
| Total | 37% (61/165) |
| Level | Task Completion Rate\* |
| ------- | ---------------------- |
| Level 1 | 55% (29/53) |
| Level 2 | 34% (29/86) |
| Level 3 | 12% (3/26) |
| Total | 37% (61/165) |

*Indicates the percentage of tasks completed successfully on the *validation* set.
*Indicates the percentage of tasks completed successfully on the *validation\* set.

#### WebArena

> Example: Tell me the count of comments that have received more downvotes than upvotes for the user who made the latest post on the Showerthoughts forum.
To solve this task, the agents began by logging into the Postmill platform using provided credentials and navigating to the Showerthoughts forum. They identified the latest post in this forum, which was made by a user named Waoonet. To proceed with the task, they then accessed Waoonet's profile to examine the comments section, where they could find all comments made by this user.

Once on Waoonet's profile, the agents focused on counting the comments that had received more downvotes than upvotes. The web\_surfer agent analyzed the available comments and found that Waoonet had made two comments, both of which had more upvotes than downvotes. Consequently, they concluded that none of Waoonet's comments had received more downvotes than upvotes. This information was summarized and reported back, completing the task successfully.
Once on Waoonet's profile, the agents focused on counting the comments that had received more downvotes than upvotes. The web_surfer agent analyzed the available comments and found that Waoonet had made two comments, both of which had more upvotes than downvotes. Consequently, they concluded that none of Waoonet's comments had received more downvotes than upvotes. This information was summarized and reported back, completing the task successfully.

| Site | Task Completion Rate |
|----------------|----------------------|
| -------------- | -------------------- |
| Reddit | 54%  (57/106) |
| Shopping | 33%  (62/187) |
| CMS | 29%  (53/182) |
Expand All @@ -152,20 +151,17 @@ Once on Waoonet's profile, the agents focused on counting the comments that had
| Multiple Sites | 15%  (7/48) |
| Total | 33%  (267/812) |


### Logging in Team One Agents

Team One agents can emit several log events that can be consumed by a log handler (see the example log handler in [utils.py](src/autogen_magentic_one/utils.py)). A list of currently emitted events are:

- OrchestrationEvent : emitted by a an [Orchestrator](src/autogen_magentic_one/agents/base_orchestrator.py) agent.
- WebSurferEvent : emitted by a [WebSurfer](src/autogen_magentic_one/agents/multimodal_web_surfer/multimodal_web_surfer.py) agent.

In addition, developers can also handle and process logs generated from the AutoGen core library (e.g., LLMCallEvent etc). See the example log handler in [utils.py](src/autogen_magentic_one/utils.py) on how this can be implemented. By default, the logs are written to a file named `log.jsonl` which can be configured as a parameter to the defined log handler. These logs can be parsed to retrieved data agent actions.

In addition, developers can also handle and process logs generated from the AutoGen core library (e.g., LLMCallEvent etc). See the example log handler in [utils.py](src/autogen_magentic_one/utils.py) on how this can be implemented. By default, the logs are written to a file named `log.jsonl` which can be configured as a parameter to the defined log handler. These logs can be parsed to retrieved data agent actions.

# Setup


You can install the Magentic-One package using pip and then run the example code to see how the agents work together to accomplish a task.

1. Clone the code.
Expand All @@ -185,7 +181,6 @@ pip install -e .
python examples/example.py
```


## Environment Configuration for Chat Completion Client

This guide outlines how to configure your environment to use the `create_completion_client_from_env` function, which reads environment variables to return an appropriate `ChatCompletionClient`.
Expand Down Expand Up @@ -226,8 +221,10 @@ To configure for OpenAI, set the following environment variables:
```

### Other Keys

Some functionalities, such as using web-search requires an API key for Bing.
You can set it using:

```bash
export BING_API_KEY=xxxxxxx
```
```

0 comments on commit 0756ebd

Please sign in to comment.