在對話中處理來自客服人員的電腦使用工具請求

焦點模式

在對話中處理來自客服人員的電腦使用工具請求 - Amazon Bedrock

當您的代理程式請求工具時，對 InvokeAgent API 操作的回應會包含returnControl承載，其中包含要使用的工具和 invocationInputs 中的工具動作。如需將控制權傳回代理程式開發人員的詳細資訊，請參閱透過在 InvokeAgent 回應中傳送引出的資訊，將控制權傳回給客服人員開發人員。

主題

傳回控制項範例
剖析工具請求的程式碼範例

傳回控制項範例

以下是returnControl承載範例，其中包含搭配 screenshot動作使用ANTHROPIC.Computer工具的請求。


{
    "returnControl": {
        "invocationId": "invocationIdExample",
        "invocationInputs": [{
            "functionInvocationInput": {
                "actionGroup": "my_computer",
                "actionInvocationType": "RESULT",
                "agentId": "agentIdExample",
                "function": "computer",
                "parameters": [{
                    "name": "action",
                    "type": "string",
                    "value": "screenshot"
                }]
            }
        }]
    }
}

剖析工具請求的程式碼範例

下列程式碼說明如何在 InvokeAgent 回應中擷取電腦使用工具選項、將其映射至不同工具的模擬工具實作，然後傳送後續 InvokeAgent 請求中工具使用的結果。

manage_computer_interaction 函數會執行迴圈，呼叫 InvocationAgent API 操作並剖析回應，直到沒有任務要完成為止。剖析回應時，它會從returnControl承載擷取要使用的任何工具，並傳遞 handle_computer_action函數。
會將函數名稱handle_computer_action對應至四個動作的模擬實作。如需工具實作範例，請參閱 Anthropic GitHub 儲存庫中的 computer-use-demo。

如需電腦使用工具的詳細資訊，包括實作範例和工具描述，請參閱 Anthropic 文件中的電腦使用（測試版）。


import boto3
from botocore.exceptions import ClientError
import json


def handle_computer_action(action_params):
    """
    Maps computer actions, like taking screenshots and moving the mouse to mock implementations and returns
    the result.

    Args:
        action_params (dict): Dictionary containing the action parameters
            Keys:
                - action (str, required): The type of action to perform (for example 'screenshot' or 'mouse_move')
                - coordinate (str, optional): JSON string containing [x,y] coordinates for mouse_move

    Returns:
        dict: Response containing the action result.
    """

    action = action_params.get('action')
    if action == 'screenshot':
        # Mock screenshot response
        with open("mock_screenshot.png", 'rb') as image_file:
            image_bytes = image_file.read()
        return {
            "IMAGES": {
                "images": [
                    {
                        "format": "png",
                        "source": {
                            "bytes": image_bytes
                        },
                    }
                ]
            }
        }
    elif action == 'mouse_move':
        # Mock mouse movement
        coordinate = json.loads(action_params.get('coordinate', '[0, 0]'))
        return {
            "TEXT": {
                "body": f"Mouse moved to coordinates {coordinate}"
            }
        }
    elif action == 'left_click':
        # Mock mouse left click
        return {
            "TEXT": {
                "body": f"Mouse left clicked"
            }
        }
    elif action == 'right_click':
        # Mock mouse right click
        return {
            "TEXT": {
                "body": f"Mouse right clicked"
            }
        }

    ### handle additional actions here


def manage_computer_interaction(bedrock_agent_runtime_client, agent_id, alias_id):
    """
    Manages interaction between an Amazon Bedrock agent and computer use functions.

    Args:
        bedrock_agent_runtime_client: Boto3 client for Bedrock agent runtime
        agent_id (str): The ID of the agent
        alias_id (str): The Alias ID of the agent

    The function:
    - Initiates a session with initial prompt
    - Makes agent requests with appropriate parameters
    - Processes response chunks and return control events
    - Handles computer actions via handle_computer_action()
    - Continues interaction until task completion
    """
    session_id = "session123"
    initial_prompt = "Open a browser and go to a website"
    computer_use_results = None
    current_prompt = initial_prompt

    while True:
        # Make agent request with appropriate parameters
        invoke_params = {
            "agentId": agent_id,
            "sessionId": session_id,
            "inputText": current_prompt,
            "agentAliasId": alias_id,
        }

        # Include session state if we have results from previous iteration
        if computer_use_results:
            invoke_params["sessionState"] = computer_use_results["sessionState"]

        try:
            response = bedrock_agent_runtime_client.invoke_agent(**invoke_params)
        except ClientError as e:
            print(f"Error: {e}")

        has_return_control = False

        # Process the response
        for event in response.get('completion'):
            if 'chunk' in event:
                chunk_content = event['chunk'].get('bytes', b'').decode('utf-8')
                if chunk_content:
                    print("\nAgent:", chunk_content)

            if 'returnControl' in event:
                has_return_control = True
                invocationId = event["returnControl"]["invocationId"]
                if "invocationInputs" in event["returnControl"]:
                    for invocationInput in event["returnControl"]["invocationInputs"]:
                        func_input = invocationInput["functionInvocationInput"]

                        # Extract action parameters
                        params = {p['name']: p['value'] for p in func_input['parameters']}

                        # Handle computer action and get result
                        action_result = handle_computer_action(params)

                        # Print action result for testing
                        print("\nExecuting function:", func_input['function'])
                        print("Parameters:", params)

                        # Prepare the session state for the next request
                        computer_use_results = {
                            "sessionState": {
                                "invocationId": invocationId,
                                "returnControlInvocationResults": [{
                                    "functionResult": {
                                        "actionGroup": func_input['actionGroup'],
                                        "responseState": "REPROMPT",
                                        "agentId": func_input['agentId'],
                                        "function": func_input['function'],
                                        "responseBody": action_result
                                    }
                                }]
                            }
                        }

        # If there's no return control event, the task is complete
        if not has_return_control:
            print("\nTask completed!")
            break

        # Use empty string as prompt for subsequent iterations
        current_prompt = ""
def main():
    bedrock_agent_runtime_client = boto3.client(service_name="bedrock-agent-runtime",
                                         region_name="REGION"
                                         )

    agent_id = "AGENT_ID"
    alias_id = "ALIAS_ID"

    manage_computer_interaction(bedrock_agent_runtime_client, agent_id, alias_id)


if __name__ == "__main__":
    main()

輸出格式應類似以下內容：


Executing function: computer
Parameters: {'action': 'screenshot'}

Executing function: computer
Parameters: {'coordinate': '[467, 842]', 'action': 'mouse_move'}

Executing function: computer
Parameters: {'action': 'left_click'}

Agent: I've opened Firefox browser. Which website would you like to visit?

Task completed!

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

為動作群組中的代理程式指定電腦使用工具

測試代理程式行為並進行疑難排解

下一個主題：

測試代理程式行為並進行疑難排解

上一個主題：

為動作群組中的代理程式指定電腦使用工具

需要協助？

在本頁面

選取您的 Cookie 偏好設定

自訂 Cookie 偏好設定

必要

效能

功能

廣告

無法儲存 Cookie 偏好設定