Meta Llama モデル

このセクションでは、のリクエストパラメータとレスポンスフィールドについて説明します。Meta Llama モデル。この情報を使用してに推論呼び出しを行う Meta Llama InvokeModel および InvokeModelWithResponseStream (ストリーミング) オペレーションを使用するモデル。このセクションには、も含まれます。Python を呼び出す方法を示すコード例 Meta Llama モデル。推論オペレーションでモデルを使用するには、そのモデルのモデル ID が必要です。モデル ID を取得するには、「Amazon Bedrock でサポートされている基盤モデル」を参照してください。一部のモデルは、 Converse API。をチェックするには Converse API が特定のをサポート Meta Llama モデルについては、「」を参照してくださいサポートされているモデルとモデルの機能。コード例については、「AWS SDKsコード例」を参照してください。

Amazon Bedrock の基盤モデルは、モデルごとに異なる入出力モダリティをサポートしています。のモダリティを確認するには Meta Llama モデルのサポートについては、「」を参照してくださいAmazon Bedrock でサポートされている基盤モデル。がどの Amazon Bedrock で機能しているかを確認するには Meta Llama モデルのサポートについては、「」を参照してくださいAmazon Bedrock でサポートされている基盤モデル。その AWS リージョンを確認するには Meta Llama モデルはで使用できます。「」を参照してくださいAmazon Bedrock でサポートされている基盤モデル。

で推論呼び出しを行う場合 Meta Llama モデルには、モデルのプロンプトを含めます。Amazon Bedrock がサポートするモデルのプロンプト作成に関する一般情報については、「プロンプトエンジニアリングの概念」を参照してください。[ Meta Llama 特定のプロンプト情報については、「」を参照してください。 Meta Llama プロンプトエンジニアリングガイド。

注記

Llama 3.2 Instruct and Llama 3.3 Instruct モデルはジオフェンシングを使用します。つまり、これらのモデルは、「 AWS リージョン」表にリストされているこれらのモデルで使用できるリージョン以外では使用できません。

このセクションでは、から次のモデルを使用する方法について説明します。Meta.

Llama 3 Instruct
Llama 3.1 Instruct
Llama 3.2 Instruct
Llama 3.3 Instruct

リクエストとレスポンス

リクエスト本文は、 InvokeModelまたはへのリクエストの bodyフィールドで渡されますInvokeModelWithResponseStream。

Request

Llama 2 Chat, Llama 2, Llama 3 Instruct, Llama 3.1 Instruct および Llama 3.2 Instruct モデルには、次の推論パラメータがあります。


{
    "prompt": string,
    "temperature": float,
    "top_p": float,
    "max_gen_len": int
}

NOTE: Llama 3.2 モデルは、文字列のリストであるリクエスト構造imagesにを追加します。例: images: Optional[List[str]]

必須パラメータを以下に示します。

prompt – (必須) モデルに渡すプロンプト。With Llama 2 Chat、次のテンプレートを使用して会話をフォーマットします。


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>

What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

<<SYS>> トークン間の指示では、モデルのシステムプロンプトが表示されます。以下は、システムプロンプトを含むプロンプトの例です。


<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

There's a llama in my garden  What should I do? [/INST]

詳細については、以下を参照してください。

オプションのパラメータを以下に示します。

temperature – 低い値を指定するとレスポンスのランダム性を減らすことができます。

デフォルト値	最小値	最大値
0.5	0	1

top_p – 低い値を指定すると、可能性の低い選択肢を無視します。0 または 1.0 に設定すると、このオプションは無効になります。

デフォルト値	最小値	最大値
0.9	0	1

max_gen_len – 生成されたレスポンスで使用するトークンの最大数を指定します。生成されたテキストの長さが max_gen_len を超えると、モデルはレスポンスを切り捨てます。

デフォルト値	最小値	最大値
512	1	2048

Response

Llama 2 Chat, Llama 2 および Llama 3 Instruct モデルは、テキスト補完推論呼び出しに対して次のフィールドを返します。


{
    "generation": "\n\n<response>",
    "prompt_token_count": int,
    "generation_token_count": int,
    "stop_reason" : string
}

各フィールドの詳細は以下のとおりです。

generation – 生成されたテキスト。
prompt_token_count – プロンプト内のトークン数。
generation_token_count – 生成されたテキスト内のトークン数。
stop_reason – モデルがテキストの生成を停止した理由。可能な値は以下のとおりです。
- 停止 - モデルは入力プロンプトのテキストの生成を終了しました。
- 長さ - 生成されたテキストにおけるトークンの長さが InvokeModel (出力をストリーミングする場合は InvokeModelWithResponseStream) の呼び出しにおける max_gen_len の値を超えています。レスポンスは max_gen_len 個のトークンの長さに切り捨てられます。max_gen_len の値を大きくしてやり直すことを検討してください。

サンプルのコード

この例では、を呼び出す方法を示します。 Meta Llama 2 Chat 13B モデル。


# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate text with Meta Llama 2 Chat (on demand).
"""

import json
import logging
import boto3


from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text(model_id, body):
    """
    Generate an image using Meta Llama 2 Chat on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (JSON): The text that the model generated, token information, and the
        reason the model stopped generating text.
    """

    logger.info("Generating image with Meta Llama 2 Chat model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    response = bedrock.invoke_model(
        body=body, modelId=model_id)

    response_body = json.loads(response.get('body').read())

    return response_body


def main():
    """
    Entrypoint for Meta Llama 2 Chat example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "meta.llama2-13b-chat-v1"
    prompt = """<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

There's a llama in my garden  What should I do? [/INST]"""
    max_gen_len = 128
    temperature = 0.1
    top_p = 0.9


    # Create request body.
    body = json.dumps({
        "prompt": prompt,
        "max_gen_len": max_gen_len,
        "temperature": temperature,
        "top_p": top_p
    })


    try:

        response = generate_text(model_id, body)

        print(f"Generated Text: {response['generation']}")
        print(f"Prompt Token count:  {response['prompt_token_count']}")
        print(f"Generation Token count:  {response['generation_token_count']}")
        print(f"Stop reason:  {response['stop_reason']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))

    else:
        print(
            f"Finished generating text with Meta Llama 2 Chat model {model_id}.")


if __name__ == "__main__":
    main()

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

AI21 Labs Jamba モデル

Mistral AI モデル