本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
当您的代理请求工具时,对您的 InvokeAgent API 操作的响应将包括一个returnControl
有效负载,其中包括要使用的工具和 InvocationInputs 中的工具操作。有关将控制权交还给代理开发者的更多信息,请参阅通过在响应中发送所得信息,将控制权交还给代理开发者 InvokeAgent 。
退货控制示例
以下是请求在screenshot
操作中使用该ANTHROPIC.Computer
工具的returnControl
负载示例。
{
"returnControl": {
"invocationId": "invocationIdExample",
"invocationInputs": [{
"functionInvocationInput": {
"actionGroup": "my_computer",
"actionInvocationType": "RESULT",
"agentId": "agentIdExample",
"function": "computer",
"parameters": [{
"name": "action",
"type": "string",
"value": "screenshot"
}]
}
}]
}
}
解析工具请求的代码示例
以下代码显示了如何在 InvokeAgent 响应中提取计算机使用工具选择,将其映射到不同工具的模拟工具实现,然后在后续 InvokeAgent 请求中发送该工具的使用结果。
-
该
manage_computer_interaction
函数运行一个循环,在该循环中调用 InvocationAgent API 操作并解析响应,直到没有任务要完成。当它解析响应时,它会从returnControl
有效载荷中提取任何要使用的工具并传递handle_computer_action
函数。 -
将函数名称
handle_computer_action
映射到四个操作的模拟实现。有关工具实现的示例,请参阅computer-use-demo中的 Anthropic GitHub 存储库。
有关计算机使用工具的更多信息,包括实现示例和工具描述,请参阅中的计算机使用(测试版)
import boto3
from botocore.exceptions import ClientError
import json
def handle_computer_action(action_params):
"""
Maps computer actions, like taking screenshots and moving the mouse to mock implementations and returns
the result.
Args:
action_params (dict): Dictionary containing the action parameters
Keys:
- action (str, required): The type of action to perform (for example 'screenshot' or 'mouse_move')
- coordinate (str, optional): JSON string containing [x,y] coordinates for mouse_move
Returns:
dict: Response containing the action result.
"""
action = action_params.get('action')
if action == 'screenshot':
# Mock screenshot response
with open("mock_screenshot.png", 'rb') as image_file:
image_bytes = image_file.read()
return {
"IMAGES": {
"images": [
{
"format": "png",
"source": {
"bytes": image_bytes
},
}
]
}
}
elif action == 'mouse_move':
# Mock mouse movement
coordinate = json.loads(action_params.get('coordinate', '[0, 0]'))
return {
"TEXT": {
"body": f"Mouse moved to coordinates {coordinate}"
}
}
elif action == 'left_click':
# Mock mouse left click
return {
"TEXT": {
"body": f"Mouse left clicked"
}
}
elif action == 'right_click':
# Mock mouse right click
return {
"TEXT": {
"body": f"Mouse right clicked"
}
}
### handle additional actions here
def manage_computer_interaction(bedrock_agent_runtime_client, agent_id, alias_id):
"""
Manages interaction between an Amazon Bedrock agent and computer use functions.
Args:
bedrock_agent_runtime_client: Boto3 client for Bedrock agent runtime
agent_id (str): The ID of the agent
alias_id (str): The Alias ID of the agent
The function:
- Initiates a session with initial prompt
- Makes agent requests with appropriate parameters
- Processes response chunks and return control events
- Handles computer actions via handle_computer_action()
- Continues interaction until task completion
"""
session_id = "session123"
initial_prompt = "Open a browser and go to a website"
computer_use_results = None
current_prompt = initial_prompt
while True:
# Make agent request with appropriate parameters
invoke_params = {
"agentId": agent_id,
"sessionId": session_id,
"inputText": current_prompt,
"agentAliasId": alias_id,
}
# Include session state if we have results from previous iteration
if computer_use_results:
invoke_params["sessionState"] = computer_use_results["sessionState"]
try:
response = bedrock_agent_runtime_client.invoke_agent(**invoke_params)
except ClientError as e:
print(f"Error: {e}")
has_return_control = False
# Process the response
for event in response.get('completion'):
if 'chunk' in event:
chunk_content = event['chunk'].get('bytes', b'').decode('utf-8')
if chunk_content:
print("\nAgent:", chunk_content)
if 'returnControl' in event:
has_return_control = True
invocationId = event["returnControl"]["invocationId"]
if "invocationInputs" in event["returnControl"]:
for invocationInput in event["returnControl"]["invocationInputs"]:
func_input = invocationInput["functionInvocationInput"]
# Extract action parameters
params = {p['name']: p['value'] for p in func_input['parameters']}
# Handle computer action and get result
action_result = handle_computer_action(params)
# Print action result for testing
print("\nExecuting function:", func_input['function'])
print("Parameters:", params)
# Prepare the session state for the next request
computer_use_results = {
"sessionState": {
"invocationId": invocationId,
"returnControlInvocationResults": [{
"functionResult": {
"actionGroup": func_input['actionGroup'],
"responseState": "REPROMPT",
"agentId": func_input['agentId'],
"function": func_input['function'],
"responseBody": action_result
}
}]
}
}
# If there's no return control event, the task is complete
if not has_return_control:
print("\nTask completed!")
break
# Use empty string as prompt for subsequent iterations
current_prompt = ""
def main():
bedrock_agent_runtime_client = boto3.client(service_name="bedrock-agent-runtime",
region_name="REGION
"
)
agent_id = "AGENT_ID
"
alias_id = "ALIAS_ID
"
manage_computer_interaction(bedrock_agent_runtime_client, agent_id, alias_id)
if __name__ == "__main__":
main()
该输出应该类似于以下内容:
Executing function: computer Parameters: {'action': 'screenshot'} Executing function: computer Parameters: {'coordinate': '[467, 842]', 'action': 'mouse_move'} Executing function: computer Parameters: {'action': 'left_click'} Agent: I've opened Firefox browser. Which website would you like to visit? Task completed!