本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
當您的代理程式請求工具時,對 InvokeAgent API 操作的回應會包含returnControl
承載,其中包含要使用的工具和 invocationInputs 中的工具動作。如需將控制權傳回代理程式開發人員的詳細資訊,請參閱 透過在 InvokeAgent 回應中傳送引出的資訊,將控制權傳回給客服人員開發人員。
傳回控制項範例
以下是returnControl
承載範例,其中包含搭配 screenshot
動作使用ANTHROPIC.Computer
工具的請求。
{
"returnControl": {
"invocationId": "invocationIdExample",
"invocationInputs": [{
"functionInvocationInput": {
"actionGroup": "my_computer",
"actionInvocationType": "RESULT",
"agentId": "agentIdExample",
"function": "computer",
"parameters": [{
"name": "action",
"type": "string",
"value": "screenshot"
}]
}
}]
}
}
剖析工具請求的程式碼範例
下列程式碼說明如何在 InvokeAgent 回應中擷取電腦使用工具選項、將其映射至不同工具的模擬工具實作,然後傳送後續 InvokeAgent 請求中工具使用的結果。
-
manage_computer_interaction
函數會執行迴圈,呼叫 InvocationAgent API 操作並剖析回應,直到沒有任務要完成為止。剖析回應時,它會從returnControl
承載擷取要使用的任何工具,並傳遞handle_computer_action
函數。 -
會將函數名稱
handle_computer_action
對應至四個動作的模擬實作。如需工具實作範例,請參閱 Anthropic GitHub 儲存庫中的 computer-use-demo。
如需電腦使用工具的詳細資訊,包括實作範例和工具描述,請參閱 Anthropic 文件中的電腦使用 (測試版)
import boto3
from botocore.exceptions import ClientError
import json
def handle_computer_action(action_params):
"""
Maps computer actions, like taking screenshots and moving the mouse to mock implementations and returns
the result.
Args:
action_params (dict): Dictionary containing the action parameters
Keys:
- action (str, required): The type of action to perform (for example 'screenshot' or 'mouse_move')
- coordinate (str, optional): JSON string containing [x,y] coordinates for mouse_move
Returns:
dict: Response containing the action result.
"""
action = action_params.get('action')
if action == 'screenshot':
# Mock screenshot response
with open("mock_screenshot.png", 'rb') as image_file:
image_bytes = image_file.read()
return {
"IMAGES": {
"images": [
{
"format": "png",
"source": {
"bytes": image_bytes
},
}
]
}
}
elif action == 'mouse_move':
# Mock mouse movement
coordinate = json.loads(action_params.get('coordinate', '[0, 0]'))
return {
"TEXT": {
"body": f"Mouse moved to coordinates {coordinate}"
}
}
elif action == 'left_click':
# Mock mouse left click
return {
"TEXT": {
"body": f"Mouse left clicked"
}
}
elif action == 'right_click':
# Mock mouse right click
return {
"TEXT": {
"body": f"Mouse right clicked"
}
}
### handle additional actions here
def manage_computer_interaction(bedrock_agent_runtime_client, agent_id, alias_id):
"""
Manages interaction between an Amazon Bedrock agent and computer use functions.
Args:
bedrock_agent_runtime_client: Boto3 client for Bedrock agent runtime
agent_id (str): The ID of the agent
alias_id (str): The Alias ID of the agent
The function:
- Initiates a session with initial prompt
- Makes agent requests with appropriate parameters
- Processes response chunks and return control events
- Handles computer actions via handle_computer_action()
- Continues interaction until task completion
"""
session_id = "session123"
initial_prompt = "Open a browser and go to a website"
computer_use_results = None
current_prompt = initial_prompt
while True:
# Make agent request with appropriate parameters
invoke_params = {
"agentId": agent_id,
"sessionId": session_id,
"inputText": current_prompt,
"agentAliasId": alias_id,
}
# Include session state if we have results from previous iteration
if computer_use_results:
invoke_params["sessionState"] = computer_use_results["sessionState"]
try:
response = bedrock_agent_runtime_client.invoke_agent(**invoke_params)
except ClientError as e:
print(f"Error: {e}")
has_return_control = False
# Process the response
for event in response.get('completion'):
if 'chunk' in event:
chunk_content = event['chunk'].get('bytes', b'').decode('utf-8')
if chunk_content:
print("\nAgent:", chunk_content)
if 'returnControl' in event:
has_return_control = True
invocationId = event["returnControl"]["invocationId"]
if "invocationInputs" in event["returnControl"]:
for invocationInput in event["returnControl"]["invocationInputs"]:
func_input = invocationInput["functionInvocationInput"]
# Extract action parameters
params = {p['name']: p['value'] for p in func_input['parameters']}
# Handle computer action and get result
action_result = handle_computer_action(params)
# Print action result for testing
print("\nExecuting function:", func_input['function'])
print("Parameters:", params)
# Prepare the session state for the next request
computer_use_results = {
"sessionState": {
"invocationId": invocationId,
"returnControlInvocationResults": [{
"functionResult": {
"actionGroup": func_input['actionGroup'],
"responseState": "REPROMPT",
"agentId": func_input['agentId'],
"function": func_input['function'],
"responseBody": action_result
}
}]
}
}
# If there's no return control event, the task is complete
if not has_return_control:
print("\nTask completed!")
break
# Use empty string as prompt for subsequent iterations
current_prompt = ""
def main():
bedrock_agent_runtime_client = boto3.client(service_name="bedrock-agent-runtime",
region_name="REGION
"
)
agent_id = "AGENT_ID
"
alias_id = "ALIAS_ID
"
manage_computer_interaction(bedrock_agent_runtime_client, agent_id, alias_id)
if __name__ == "__main__":
main()
輸出格式應類似以下內容:
Executing function: computer Parameters: {'action': 'screenshot'} Executing function: computer Parameters: {'coordinate': '[467, 842]', 'action': 'mouse_move'} Executing function: computer Parameters: {'action': 'left_click'} Agent: I've opened Firefox browser. Which website would you like to visit? Task completed!