-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat: Add support for image function tools #654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit introduces the ImageFunctionTool and ImageFunctionToolResult classes, enabling the creation and execution of image-generating tools. The necessary modifications include updates to the tool execution logic, new data classes for handling image function calls, and adjustments to the response processing to accommodate image outputs. Additionally, the input handling in the Runner class has been refined to support the new image function items. Changes include: - New classes: ImageFunctionTool, ImageFunctionToolResult, ToolRunImageFunction - Updated tool execution methods to handle image functions - Modifications to the ProcessedResponse class to include image function results - Enhancements to ItemHelpers for image function output formatting - Adjustments in the Runner class for input item processing These changes enhance the SDK's capabilities for handling image generation tasks alongside existing function tools.
Hi,
Can you expand this PR to also support the concept of a |
I appreciate very much that you've done this work. |
@stevemadere At that point, instead of a tool, we just have an agent participating in the conversation by posting a message made of multiple parts. |
I thought agent responses have to be all string ? How does the agent response return File content types without a PR like this one? |
@diwu-sf OpenAI is now promoting Responses API instead of ChatCompletion API. This new spec allows agents or models to return output as various parts of different types: ![]() |
@nileshtrivedi nope, function call responses from the tool itself still must be string: That's why this PR is generating a user message to embed the function call's image output: @classmethod
def image_function_tool_call_output_item(
cls, tool_call: ResponseFunctionToolCall, output: str
) -> FunctionCallOutput:
"""Creates a tool call output item from a tool call and its output."""
return [
{
"call_id": tool_call.call_id,
"output": "Image generating tool is called.",
"type": "function_call_output",
},
{
"role": "user",
"content": [
{
"type": "input_image",
"image_url": output,
}
],
},
] Something similar can be done for arbitrary PDF / file uploads |
Consider the following situation (which I suspect is about to become super common): A MCP server to conduct web browsing operations such as navigateToUrl, takeAction. (e.g. via stagehand). Now, such an action would need to return all of these:
It could store the screenshot at a publicly accessible location on the web (e.g. in a S3 bucket served via cloudfront) so that the screenshot could be returned as an http URL easily enough. (just perfect for providing in a input_image message hoisted from the MCP function call results) The LLM at OpenAI can examine the screenshot and decide which action to take next and make a tool call to take that action. Did I do a better job of describing the multi-modal results and why an MCP tool call would need to propagate them all simultaneously to the calling model? |
This PR adds support for image function tools to the OpenAI Agents Python SDK.
It is inspired by the #341 .
Current
function_tool
implementation only allows output to be strictly string, which creates a problem when we want to pass image input in the request data. This PR tackles this problem by both providing a standartfunction_call_output
and additional image-related arguments back-to-back.What's included
ImageFunctionTool
class andimage_function_tool
decoratorUsage
Use the
@image_function_tool
decorator to create tools that work with images:The tool can then be used to allow agents to process and analyze images.
Supporting example script is located at
examples/tools/image_function_tool.py