本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
開始使用
本指南說明如何在 SageMaker 即時端點上部署自訂的 Amazon Nova 模型、設定推論參數,以及叫用模型進行測試。
先決條件
以下是在 SageMaker 推論上部署 Amazon Nova 模型的先決條件:
-
建立 AWS 帳戶 - 如果您還沒有,請參閱建立 AWS 帳戶。
-
必要的 IAM 許可 - 確保您的 IAM 使用者或角色已連接下列受管政策:
-
AmazonSageMakerFullAccess -
AmazonS3FullAccess
-
-
必要的 SDKs/CLI 版本 - 下列 SDK 版本已在 SageMaker 推論上使用 Amazon Nova 模型進行測試和驗證:
-
適用於資源型 API 方法的 SageMaker Python SDK v3.0.0+ (
sagemaker>=3.0.0) -
Boto3 1.35.0+ 版 (
boto3>=1.35.0) 用於直接 API 呼叫。本指南中的範例使用此方法。
-
-
增加服務配額 - 針對您計劃用於 Amazon SageMaker SageMaker 服務配額 (例如
ml.p5.48xlarge for endpoint usage)。如需支援的執行個體類型清單,請參閱 支援的模型和執行個體。若要請求提高配額,請參閱請求提高配額。如需 SageMaker 執行個體配額的相關資訊,請參閱 SageMaker 端點和配額。
提示
若要快速end-to-end部署,您可以執行自訂 Nova Model SageMaker 推論筆記本
步驟 1:設定 AWS 登入資料
使用下列其中一種方法設定您的 AWS 登入資料:
選項 1: AWS CLI (建議)
aws configure
出現提示時,輸入您的 AWS 存取金鑰、私密金鑰和預設區域。
選項 2: AWS credentials 檔案
建立或編輯 ~/.aws/credentials:
[default] aws_access_key_id = YOUR_ACCESS_KEY aws_secret_access_key = YOUR_SECRET_KEY
選項 3:環境變數
export AWS_ACCESS_KEY_ID=your_access_key export AWS_SECRET_ACCESS_KEY=your_secret_key
注意
如需 AWS 登入資料的詳細資訊,請參閱組態和登入資料檔案設定。
初始化 AWS 用戶端
使用下列程式碼建立 Python 指令碼或筆記本,以初始化 AWS SDK 並驗證您的登入資料:
import boto3 # AWS Configuration - Update these for your environment REGION = "us-east-1" # Supported regions: us-east-1, us-west-2 AWS_ACCOUNT_ID = "YOUR_ACCOUNT_ID" # Replace with your AWS account ID # Initialize AWS clients using default credential chain sagemaker = boto3.client('sagemaker', region_name=REGION) sts = boto3.client('sts') # Verify credentials try: identity = sts.get_caller_identity() print(f"Successfully authenticated to AWS Account: {identity['Account']}") if identity['Account'] != AWS_ACCOUNT_ID: print(f"Warning: Connected to account {identity['Account']}, expected {AWS_ACCOUNT_ID}") except Exception as e: print(f"Failed to authenticate: {e}") print("Please verify your credentials are configured correctly.")
如果身分驗證成功,您應該會看到確認 AWS 帳戶 ID 的輸出。
步驟 2:建立 SageMaker 執行角色
SageMaker 執行角色是一種 IAM 角色,可授予 SageMaker 代表您存取 AWS 資源的許可,例如用於模型成品的 Amazon S3 儲存貯體和用於記錄的 CloudWatch。
建立執行角色
注意
建立 IAM 角色需要 iam:CreateRole和 iam:AttachRolePolicy許可。在繼續之前,請確定您的 IAM 使用者或角色具有這些許可。
下列程式碼會建立具有部署 Amazon Nova 自訂模型必要許可的 IAM 角色:
import json # Create SageMaker Execution Role role_name = f"SageMakerInference-ExecutionRole-{AWS_ACCOUNT_ID}" trust_policy = { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": {"Service": "sagemaker.amazonaws.com"}, "Action": "sts:AssumeRole" } ] } iam = boto3.client('iam', region_name=REGION) # Create the role role_response = iam.create_role( RoleName=role_name, AssumeRolePolicyDocument=json.dumps(trust_policy), Description='SageMaker execution role with S3 and SageMaker access' ) # Attach required policies iam.attach_role_policy( RoleName=role_name, PolicyArn='arn:aws:iam::aws:policy/AmazonSageMakerFullAccess' ) iam.attach_role_policy( RoleName=role_name, PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess' ) SAGEMAKER_EXECUTION_ROLE_ARN = role_response['Role']['Arn'] print(f"Created SageMaker execution role: {SAGEMAKER_EXECUTION_ROLE_ARN}")
使用現有的執行角色 (選用)
如果您已有 SageMaker 執行角色,您可以改用它:
# Replace with your existing role ARN SAGEMAKER_EXECUTION_ROLE_ARN = "arn:aws:iam::YOUR_ACCOUNT_ID:role/YOUR_EXISTING_ROLE_NAME"
若要尋找帳戶中現有的 SageMaker 角色:
iam = boto3.client('iam', region_name=REGION) response = iam.list_roles() sagemaker_roles = [role for role in response['Roles'] if 'SageMaker' in role['RoleName']] for role in sagemaker_roles: print(f"{role['RoleName']}: {role['Arn']}")
重要
執行角色必須與 sagemaker.amazonaws.com和 具有信任關係,才能存取 Amazon S3 和 SageMaker 資源。
如需 SageMaker 執行角色的詳細資訊,請參閱 SageMaker 角色。
步驟 3:設定模型參數
設定 Amazon Nova 模型的部署參數。這些設定控制模型行為、資源配置和推論特性。如需每個支援的執行個體類型和支援的 CONTEXT_LENGTH 和 MAX_CONCURRENCY 值清單,請參閱 支援的模型和執行個體。如需取樣預設值、推測解碼和量化等其他容器功能的完整清單,請參閱 推論容器功能。
必要參數
-
IMAGE:Amazon Nova 推論容器的 Docker 容器映像 URI。這將由 提供 AWS。 -
CONTEXT_LENGTH:模型內容長度。 -
MAX_CONCURRENCY:每次反覆運算的序列數目上限;設定在 GPU 上單一批次內可同時處理個別使用者請求 (提示) 的數量限制。範圍:大於 0 的整數。
設定您的部署
# AWS Configuration REGION = "us-east-1" # Must match region from Step 1 # ECR Account mapping by region ECR_ACCOUNT_MAP = { "us-east-1": "708977205387", "us-west-2": "176779409107" } # Container Image IMAGE = f"{ECR_ACCOUNT_MAP[REGION]}.dkr.ecr.{REGION}.amazonaws.com/nova-inference-repo:SM-Inference-latest" print(f"IMAGE = {IMAGE}") # Required parameters CONTEXT_LENGTH = "8000" # Maximum total context length MAX_CONCURRENCY = "8" # Maximum concurrent sequences # Build environment variables for the container environment = { 'CONTEXT_LENGTH': CONTEXT_LENGTH, 'MAX_CONCURRENCY': MAX_CONCURRENCY, # Optional: add container feature environment variables here. # See "Inference Container Features" for the full list. # Examples: # 'DEFAULT_TEMPERATURE': '0.7', # 'DEFAULT_MAX_NEW_TOKENS': '512', # 'QUANTIZATION_DTYPE': 'fp8', } print("Environment configuration:") for key, value in environment.items(): print(f" {key}: {value}")
設定部署特定的參數
現在為您的 Amazon Nova 模型部署設定特定參數,包括模型成品位置和執行個體類型選擇。
設定部署識別符
# Deployment identifier - use a descriptive name for your use case JOB_NAME = "my-nova-deployment"
指定模型成品位置
提供存放訓練過之 Amazon Nova 模型成品的 Amazon S3 URI。這應該是模型訓練或微調任務的輸出位置。
# S3 location of your trained Nova model artifacts # Replace with your model's S3 URI - must end with / MODEL_S3_LOCATION = "s3://your-bucket-name/path/to/model/artifacts/"
選取模型變體和執行個體類型
# Configure model variant and instance type TESTCASE = { "model": "micro", # Options: micro, lite, lite2 "instance": "ml.g5.12xlarge" # Refer to "Supported models and instances" section } # Generate resource names INSTANCE_TYPE = TESTCASE["instance"] MODEL_NAME = JOB_NAME + "-" + TESTCASE["model"] + "-" + INSTANCE_TYPE.replace(".", "-") ENDPOINT_CONFIG_NAME = MODEL_NAME + "-Config" ENDPOINT_NAME = MODEL_NAME + "-Endpoint" print(f"Model Name: {MODEL_NAME}") print(f"Endpoint Config: {ENDPOINT_CONFIG_NAME}") print(f"Endpoint Name: {ENDPOINT_NAME}")
命名慣例
程式碼會自動為 AWS 資源產生一致的名稱:
-
模型名稱:
{JOB_NAME}-{model}-{instance-type} -
端點組態:
{MODEL_NAME}-Config -
端點名稱:
{MODEL_NAME}-Endpoint
步驟 4:建立 SageMaker 資源並部署端點
SageMaker 提供兩種將模型部署到即時端點的方法。選擇適合您使用案例的方法:
-
推論元件 (建議):將模型部署為端點上的推論元件。此方法可讓您在單一端點上託管多個模型、獨立擴展模型,以及最佳化資源使用率。
-
單一模型端點:使用模型物件和端點組態,將單一模型直接部署至端點。此方法更易於設定,且適合每個端點只需要一個模型的開發、測試或工作負載。
選項 A:使用推論元件建立
使用推論元件,您會先建立端點,然後將模型部署為該端點上的推論元件。這會將模型與端點基礎設施分離,讓您更具彈性。
建立端點組態
建立端點組態來定義基礎設施,而不指定模型。執行個體類型和計數是在端點層級管理:
# Create Endpoint Configuration for inference components INFERENCE_COMPONENT_NAME = MODEL_NAME + "-IC" try: config_response = sagemaker.create_endpoint_config( EndpointConfigName=ENDPOINT_CONFIG_NAME, ProductionVariants=[ { 'VariantName': 'primary', 'InstanceType': INSTANCE_TYPE, 'InitialInstanceCount': 1, 'RoutingConfig': { 'RoutingStrategy': 'LEAST_OUTSTANDING_REQUESTS' } } ], Tags=[ { 'Key': 'sagemaker:nova-inference-component', 'Value': 'true' } ] ) print("Endpoint configuration created successfully!") print(f"Config ARN: {config_response['EndpointConfigArn']}") except sagemaker.exceptions.ClientError as e: print(f"Error creating endpoint configuration: {e}")
建立和部署端點
import time try: endpoint_response = sagemaker.create_endpoint( EndpointName=ENDPOINT_NAME, EndpointConfigName=ENDPOINT_CONFIG_NAME ) print("Endpoint creation initiated successfully!") print(f"Endpoint ARN: {endpoint_response['EndpointArn']}") except Exception as e: print(f"Error creating endpoint: {e}") # Wait for endpoint to be InService print("Waiting for endpoint to be InService...") print("This typically takes 5-10 minutes...\n") while True: try: response = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME) status = response['EndpointStatus'] if status == 'Creating': print(f"⏳ Status: {status} - Provisioning infrastructure...") elif status == 'InService': print(f"✅ Status: {status}") print(f"\nEndpoint '{ENDPOINT_NAME}' is ready.") break elif status == 'Failed': print(f"❌ Status: {status}") print(f"Failure Reason: {response.get('FailureReason', 'Unknown')}") break else: print(f"Status: {status}") except Exception as e: print(f"Error checking endpoint status: {e}") break time.sleep(30)
建立推論元件
一旦端點為 InService,請將 Amazon Nova 模型部署為推論元件:
try: ic_response = sagemaker.create_inference_component( InferenceComponentName=INFERENCE_COMPONENT_NAME, EndpointName=ENDPOINT_NAME, VariantName='primary', Specification={ 'Container': { 'Image': IMAGE, 'ArtifactUrl': MODEL_S3_LOCATION, 'Environment': environment }, 'ComputeResourceRequirements': { 'NumberOfCpuCoresRequired': 15, 'NumberOfAcceleratorDevicesRequired': 4, 'MinMemoryRequiredInMb': 25000 } }, RuntimeConfig={ 'CopyCount': 1 } ) print("Inference component creation initiated!") print(f"Inference Component ARN: {ic_response['InferenceComponentArn']}") except sagemaker.exceptions.ClientError as e: print(f"Error creating inference component: {e}")
重要參數:
-
InferenceComponentName:推論元件的唯一識別符 -
EndpointName:要在 上部署元件的端點 -
Image:Amazon Nova 推論的 Docker 容器映像 URI -
ArtifactUrl:模型成品的 Amazon S3 位置 -
Environment:在步驟 3 中設定的環境變數 -
NumberOfCpuCoresRequired:每個模型複製所需的 CPU 核心數量 -
NumberOfAcceleratorDevicesRequired:每個模型複製所需的加速器裝置 (GPUs) 數量 -
MinMemoryRequiredInMb:每個模型複製所需的最低記憶體,以 MB 為單位 -
CopyCount:要部署的模型複本數量
監控推論元件部署
# Wait for inference component to be InService print("Waiting for inference component deployment...") print("This typically takes 10-20 minutes as the model is loaded...\n") while True: try: ic_desc = sagemaker.describe_inference_component( InferenceComponentName=INFERENCE_COMPONENT_NAME ) ic_status = ic_desc['InferenceComponentStatus'] if ic_status == 'Creating': print(f"⏳ Status: {ic_status} - Loading model artifacts...") elif ic_status == 'InService': print(f"✅ Status: {ic_status}") print(f"\nInference component '{INFERENCE_COMPONENT_NAME}' is ready!") break elif ic_status == 'Failed': print(f"❌ Status: {ic_status}") print(f"Failure Reason: {ic_desc.get('FailureReason', 'Unknown')}") break else: print(f"Status: {ic_status}") except Exception as e: print(f"Error checking inference component status: {e}") break time.sleep(30)
注意
在步驟 5 中叫用端點時,您必須在叫用呼叫中包含 InferenceComponentName 參數。如需詳細資訊,請參閱步驟 5。
選項 B:使用單一模型端點建立
使用單一模型端點,您可以建立 SageMaker 模型物件、端點組態,然後部署端點。此方法會將模型直接封裝到端點組態中。
建立 SageMaker 模型
下列程式碼會建立參考 Amazon Nova 模型成品的 SageMaker 模型:
try: model_response = sagemaker.create_model( ModelName=MODEL_NAME, PrimaryContainer={ 'Image': IMAGE, 'ModelDataSource': { 'S3DataSource': { 'S3Uri': MODEL_S3_LOCATION, 'S3DataType': 'S3Prefix', 'CompressionType': 'None' } }, 'Environment': environment }, ExecutionRoleArn=SAGEMAKER_EXECUTION_ROLE_ARN, EnableNetworkIsolation=True ) print("Model created successfully!") print(f"Model ARN: {model_response['ModelArn']}") except sagemaker.exceptions.ClientError as e: print(f"Error creating model: {e}")
重要參數:
-
ModelName:模型的唯一識別符 -
Image:Amazon Nova 推論的 Docker 容器映像 URI -
ModelDataSource:模型成品的 Amazon S3 位置 -
Environment:在步驟 3 中設定的環境變數 -
ExecutionRoleArn:步驟 2 中的 IAM 角色 -
EnableNetworkIsolation:將 設定為 True 以增強安全性 (防止容器進行傳出網路呼叫)
建立端點組態
接著,建立定義部署基礎設施的端點組態:
# Create Endpoint Configuration try: production_variant = { 'VariantName': 'primary', 'ModelName': MODEL_NAME, 'InitialInstanceCount': 1, 'InstanceType': INSTANCE_TYPE, } config_response = sagemaker.create_endpoint_config( EndpointConfigName=ENDPOINT_CONFIG_NAME, ProductionVariants=[production_variant] ) print("Endpoint configuration created successfully!") print(f"Config ARN: {config_response['EndpointConfigArn']}") except sagemaker.exceptions.ClientError as e: print(f"Error creating endpoint configuration: {e}")
重要參數:
-
VariantName:此模型變體的識別符 (單一模型部署使用「主要」) -
ModelName:參考上面建立的模型 -
InitialInstanceCount:要部署的執行個體數量 (從 1 開始,視需要稍後擴展) -
InstanceType:在步驟 3 中選取 ML 執行個體類型
部署端點
import time try: endpoint_response = sagemaker.create_endpoint( EndpointName=ENDPOINT_NAME, EndpointConfigName=ENDPOINT_CONFIG_NAME ) print("Endpoint creation initiated successfully!") print(f"Endpoint ARN: {endpoint_response['EndpointArn']}") except Exception as e: print(f"Error creating endpoint: {e}")
監控端點建立
下列程式碼會輪詢端點狀態,直到部署完成:
# Monitor endpoint creation progress print("Waiting for endpoint creation to complete...") print("This typically takes 15-30 minutes...\n") while True: try: response = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME) status = response['EndpointStatus'] if status == 'Creating': print(f"⏳ Status: {status} - Provisioning infrastructure and loading model...") elif status == 'InService': print(f"✅ Status: {status}") print("\nEndpoint creation completed successfully!") print(f"Endpoint Name: {ENDPOINT_NAME}") print(f"Endpoint ARN: {response['EndpointArn']}") break elif status == 'Failed': print(f"❌ Status: {status}") print(f"Failure Reason: {response.get('FailureReason', 'Unknown')}") print("\nFull response:") print(response) break else: print(f"Status: {status}") except Exception as e: print(f"Error checking endpoint status: {e}") break time.sleep(30) # Check every 30 seconds
驗證資源建立
您可以驗證您的資源是否已成功建立:
# Describe the model model_info = sagemaker.describe_model(ModelName=MODEL_NAME) print(f"Model Status: {model_info['ModelName']} created") # Describe the endpoint configuration config_info = sagemaker.describe_endpoint_config(EndpointConfigName=ENDPOINT_CONFIG_NAME) print(f"Endpoint Config Status: {config_info['EndpointConfigName']} created")
確認端點已就緒
無論您選擇哪種方法,都可以驗證端點組態:
# Get detailed endpoint information endpoint_info = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME) print("\n=== Endpoint Details ===") print(f"Endpoint Name: {endpoint_info['EndpointName']}") print(f"Endpoint ARN: {endpoint_info['EndpointArn']}") print(f"Status: {endpoint_info['EndpointStatus']}") print(f"Creation Time: {endpoint_info['CreationTime']}") print(f"Last Modified: {endpoint_info['LastModifiedTime']}") # Get endpoint config for instance type details endpoint_config_name = endpoint_info['EndpointConfigName'] endpoint_config = sagemaker.describe_endpoint_config(EndpointConfigName=endpoint_config_name) # Display production variant details for variant in endpoint_info['ProductionVariants']: print(f"\nProduction Variant: {variant['VariantName']}") print(f" Current Instance Count: {variant['CurrentInstanceCount']}") print(f" Desired Instance Count: {variant['DesiredInstanceCount']}") # Get instance type from endpoint config for config_variant in endpoint_config['ProductionVariants']: if config_variant['VariantName'] == variant['VariantName']: print(f" Instance Type: {config_variant['InstanceType']}") break
對端點建立失敗進行故障診斷
常見的失敗原因:
-
容量不足:請求的執行個體類型在您的區域無法使用
-
解決方案:嘗試不同的執行個體類型或請求提高配額
-
-
IAM 許可:執行角色缺少必要的許可
-
解決方案:確認角色可存取 Amazon S3 模型成品和必要的 SageMaker 許可
-
-
找不到模型成品:Amazon S3 URI 不正確或無法存取
-
解決方案:驗證 Amazon S3 URI 並檢查儲存貯體許可,確保您位於正確的區域
-
-
資源限制:端點或執行個體超過帳戶限制
-
解決方案:透過 Service Quotas 或 Support 請求提高服務配額 AWS
-
注意
如果您需要刪除失敗的端點並重新開始:
sagemaker.delete_endpoint(EndpointName=ENDPOINT_NAME)
步驟 5:叫用端點
一旦您的端點為 InService,您就可以傳送推論請求,以從 Amazon Nova 模型產生預測。SageMaker 支援同步端點 (即時串流/非串流模式) 和非同步端點 (以 Amazon S3 為基礎進行批次處理)。
設定執行期用戶端
使用適當的逾時設定建立 SageMaker 執行期用戶端:
import json import boto3 import botocore from botocore.exceptions import ClientError # Configure client with appropriate timeouts config = botocore.config.Config( read_timeout=120, # Maximum time to wait for response connect_timeout=10, # Maximum time to establish connection retries={'max_attempts': 3} # Number of retry attempts ) # Create SageMaker Runtime client runtime_client = boto3.client('sagemaker-runtime', config=config, region_name=REGION)
建立通用推論函數
下列 函數會同時處理串流和非串流請求。它使用步驟 4 中定義的INFERENCE_COMPONENT_NAME變數。如果您使用推論元件 (選項 A) 部署,這會設定為 MODEL_NAME + "-IC"。如果您使用單一模型端點 (選項 B) 部署,則尚未定義,因此請先將其設定為 ,None再執行此步驟:
# Only needed if you followed Option B (single model endpoints) in Step 4: # INFERENCE_COMPONENT_NAME = None def invoke_nova_endpoint(request_body): """ Invoke Nova endpoint with automatic streaming detection. Supports both inference component and single model endpoint deployments. Args: request_body (dict): Request payload containing prompt and parameters Returns: dict: Response from the model (for non-streaming requests) None: For streaming requests (prints output directly) """ body = json.dumps(request_body) is_streaming = request_body.get("stream", False) # Build invoke parameters invoke_params = { 'EndpointName': ENDPOINT_NAME, 'ContentType': 'application/json', 'Body': body } # Add InferenceComponentName if using inference components if INFERENCE_COMPONENT_NAME: invoke_params['InferenceComponentName'] = INFERENCE_COMPONENT_NAME try: print(f"Invoking endpoint ({'streaming' if is_streaming else 'non-streaming'})...") if is_streaming: response = runtime_client.invoke_endpoint_with_response_stream(**invoke_params) event_stream = response['Body'] for event in event_stream: if 'PayloadPart' in event: chunk = event['PayloadPart'] if 'Bytes' in chunk: data = chunk['Bytes'].decode() print("Chunk:", data) else: # Non-streaming inference invoke_params['Accept'] = 'application/json' response = runtime_client.invoke_endpoint(**invoke_params) response_body = response['Body'].read().decode('utf-8') result = json.loads(response_body) print("✅ Response received successfully") return result except ClientError as e: error_code = e.response['Error']['Code'] error_message = e.response['Error']['Message'] print(f"❌ AWS Error: {error_code} - {error_message}") except Exception as e: print(f"❌ Unexpected error: {str(e)}")
範例 1:非串流聊天完成
使用聊天格式進行對話互動:
# Non-streaming chat request chat_request = { "messages": [ {"role": "user", "content": "Hello! How are you?"} ], "max_tokens": 100, "max_completion_tokens": 100, # Alternative to max_tokens "stream": False, "temperature": 0.7, "top_p": 0.9, "top_k": 50, "logprobs": True, "top_logprobs": 3, "allowed_token_ids": None, # List of allowed token IDs "truncate_prompt_tokens": None, # Truncate prompt to this many tokens "stream_options": None } response = invoke_nova_endpoint(chat_request)
回應範例:
{ "id": "chatcmpl-123456", "object": "chat.completion", "created": 1234567890, "model": "default", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! I'm doing well, thank you for asking. I'm here and ready to help you with any questions or tasks you might have. How can I assist you today?" }, "logprobs": { "content": [ { "token": "Hello", "logprob": -0.123, "top_logprobs": [ {"token": "Hello", "logprob": -0.123}, {"token": "Hi", "logprob": -2.456}, {"token": "Hey", "logprob": -3.789} ] } # Additional tokens... ] }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 28, "total_tokens": 40 } }
範例 2:簡單文字完成
使用完成格式產生簡單的文字:
# Simple completion request completion_request = { "prompt": "The capital of France is", "max_tokens": 50, "stream": False, "temperature": 0.0, "top_p": 1.0, "top_k": -1, # -1 means no limit "logprobs": 3, # Number of log probabilities to return "allowed_token_ids": None, # List of allowed token IDs "truncate_prompt_tokens": None, # Truncate prompt to this many tokens "stream_options": None } response = invoke_nova_endpoint(completion_request)
回應範例:
{ "id": "cmpl-789012", "object": "text_completion", "created": 1234567890, "model": "default", "choices": [ { "text": " Paris.", "index": 0, "logprobs": { "tokens": [" Paris", "."], "token_logprobs": [-0.001, -0.002], "top_logprobs": [ {" Paris": -0.001, " London": -5.234, " Rome": -6.789}, {".": -0.002, ",": -4.567, "!": -7.890} ] }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 6, "completion_tokens": 2, "total_tokens": 8 } }
範例 3:串流聊天完成
# Streaming chat request streaming_request = { "messages": [ {"role": "user", "content": "Tell me a short story about a robot"} ], "max_tokens": 200, "stream": True, "temperature": 0.7, "top_p": 0.95, "top_k": 40, "logprobs": True, "top_logprobs": 2, "stream_options": {"include_usage": True} } invoke_nova_endpoint(streaming_request)
串流輸出範例:
Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}],"prompt_token_ids":null} Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"content":" Once"},"logprobs":{"content":[{"token":"\u2581Once","logprob":-0.6078429222106934,"bytes":[226,150,129,79,110,99,101],"top_logprobs":[{"token":"\u2581Once","logprob":-0.6078429222106934,"bytes":[226,150,129,79,110,99,101]},{"token":"\u2581In","logprob":-0.7864127159118652,"bytes":[226,150,129,73,110]}]}]},"finish_reason":null,"token_ids":null}]} Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"content":" upon"},"logprobs":{"content":[{"token":"\u2581upon","logprob":-0.0012345,"bytes":[226,150,129,117,112,111,110],"top_logprobs":[{"token":"\u2581upon","logprob":-0.0012345,"bytes":[226,150,129,117,112,111,110]},{"token":"\u2581a","logprob":-6.789,"bytes":[226,150,129,97]}]}]},"finish_reason":null,"token_ids":null}]} Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"content":" a"},"logprobs":{"content":[{"token":"\u2581a","logprob":-0.0001234,"bytes":[226,150,129,97],"top_logprobs":[{"token":"\u2581a","logprob":-0.0001234,"bytes":[226,150,129,97]},{"token":"\u2581time","logprob":-9.123,"bytes":[226,150,129,116,105,109,101]}]}]},"finish_reason":null,"token_ids":null}]} Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{"content":" time"},"logprobs":{"content":[{"token":"\u2581time","logprob":-0.0023456,"bytes":[226,150,129,116,105,109,101],"top_logprobs":[{"token":"\u2581time","logprob":-0.0023456,"bytes":[226,150,129,116,105,109,101]},{"token":",","logprob":-6.012,"bytes":[44]}]}]},"finish_reason":null,"token_ids":null}]} # Additional chunks... Chunk: data: {"id":"chatcmpl-029ca032-fa01-4868-80b7-c4cb1af90fb9","object":"chat.completion.chunk","created":1772060532,"model":"default","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":87,"total_tokens":102}} Chunk: data: [DONE]
範例 4:多模式聊天完成
針對影像和文字輸入使用多模式格式:
# Multimodal chat request (if supported by your model) multimodal_request = { "messages": [ { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} ] } ], "max_tokens": 150, "temperature": 0.3, "top_p": 0.8, "stream": False } response = invoke_nova_endpoint(multimodal_request)
回應範例:
{ "id": "chatcmpl-345678", "object": "chat.completion", "created": 1234567890, "model": "default", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The image shows..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 1250, "completion_tokens": 45, "total_tokens": 1295 } }
步驟 6:清除資源 (選用)
為了避免產生不必要的費用,請刪除您在本教學課程中建立 AWS 的資源。SageMaker 端點會在執行時產生費用,即使您沒有主動提出推論請求。
重要
刪除資源是永久的,無法復原。在繼續之前,請確定您不再需要這些資源。
初始化清除用戶端
import boto3 import time # Initialize SageMaker client sagemaker = boto3.client('sagemaker', region_name=REGION)
刪除推論元件 (如果使用選項 A)
如果您使用推論元件部署,請先刪除推論元件,然後再刪除端點:
# Delete inference component (Option A only) try: print("Deleting inference component...") sagemaker.delete_inference_component(InferenceComponentName=INFERENCE_COMPONENT_NAME) print(f"✅ Inference component '{INFERENCE_COMPONENT_NAME}' deletion initiated") except Exception as e: print(f"❌ Error deleting inference component: {e}") # Wait for inference component to be deleted before proceeding print("Waiting for inference component deletion...") while True: try: sagemaker.describe_inference_component(InferenceComponentName=INFERENCE_COMPONENT_NAME) time.sleep(10) except sagemaker.exceptions.ClientError as e: if e.response['Error']['Code'] == 'ValidationException': print("✅ Inference component successfully deleted") break else: print(f"Error: {e}") break
刪除端點
try: print("Deleting endpoint...") sagemaker.delete_endpoint(EndpointName=ENDPOINT_NAME) print(f"✅ Endpoint '{ENDPOINT_NAME}' deletion initiated") print("Charges will stop once deletion completes (typically 2-5 minutes)") except Exception as e: print(f"❌ Error deleting endpoint: {e}")
注意
端點刪除是非同步的。您可以監控刪除狀態:
import time print("Monitoring endpoint deletion...") while True: try: response = sagemaker.describe_endpoint(EndpointName=ENDPOINT_NAME) status = response['EndpointStatus'] print(f"Status: {status}") time.sleep(10) except sagemaker.exceptions.ClientError as e: if e.response['Error']['Code'] == 'ValidationException': print("✅ Endpoint successfully deleted") break else: print(f"Error: {e}") break
刪除端點組態
刪除端點後,移除端點組態:
try: print("Deleting endpoint configuration...") sagemaker.delete_endpoint_config(EndpointConfigName=ENDPOINT_CONFIG_NAME) print(f"✅ Endpoint configuration '{ENDPOINT_CONFIG_NAME}' deleted") except Exception as e: print(f"❌ Error deleting endpoint configuration: {e}")
刪除模型 (僅限選項 B)
如果您使用單一模型端點,請移除 SageMaker 模型物件:
try: print("Deleting model...") sagemaker.delete_model(ModelName=MODEL_NAME) print(f"✅ Model '{MODEL_NAME}' deleted") except Exception as e: print(f"❌ Error deleting model: {e}")