Meta Llama 모델

이 섹션에서는에 대한 요청 파라미터 및 응답 필드를 설명합니다.Meta Llama 모델. 이 정보를 사용하여에 추론 호출 Meta Llama InvokeModel 및 InvokeModelWithResponseStream (스트리밍) 작업이 있는 모델. 이 섹션에는 Python 를 호출하는 방법을 보여주는 코드 예제 Meta Llama 모델. 추론 작업에서 모델을 사용하려면 해당 모델의 모델 ID가 필요합니다. 모델 ID를 가져오려면 Amazon Bedrock에서 지원되는 파운데이션 모델 섹션을 참조하세요. 일부 모델은 에서도 작동합니다. Converse API. 가 Converse API는 특정 Meta Llama 모델을 참조하세요지원되는 모델 및 모델 기능. 더 많은 코드 예제는 AWS SDKs를 사용하는 Amazon Bedrock의 코드 예제 섹션을 참조하세요.

Amazon Bedrock의 파운데이션 모델은 모델마다 다른 입력 및 출력 양식을 지원합니다. 에서 사용하는 형식을 확인하려면 Meta Llama 모델 지원은 섹션을 참조하세요Amazon Bedrock에서 지원되는 파운데이션 모델. 를 사용하는 Amazon Bedrock을 확인하려면 Meta Llama 모델 지원은 섹션을 참조하세요Amazon Bedrock에서 지원되는 파운데이션 모델. 다음 AWS 리전을 확인하려면 Meta Llama 모델은에서 사용할 수 있습니다. 단원을 참조하십시오Amazon Bedrock에서 지원되는 파운데이션 모델.

를 사용하여 추론 호출을 수행하는 경우 Meta Llama 모델, 모델에 대한 프롬프트를 포함합니다. Amazon Bedrock이 지원하는 모델에 대한 프롬프트를 만드는 방법의 일반적인 내용은 프롬프트 엔지니어링 개념 섹션을 참조하세요. 의 경우 Meta Llama 특정 프롬프트 정보는를 참조하세요. Meta Llama 프롬프트 엔지니어링 가이드.

참고

Llama 3.2 Instruct and Llama 3.3 Instruct 모델은 지오펜싱을 사용합니다. 즉, 이러한 모델은 AWS 리전 테이블에 나열된 이러한 모델에 사용할 수 있는 리전 외부에서 사용할 수 없습니다.

이 섹션에서는에서 다음 모델을 사용하는 방법에 대한 정보를 제공합니다.Meta.

Llama 3 Instruct
Llama 3.1 Instruct
Llama 3.2 Instruct
Llama 3.3 Instruct

요청 및 응답

요청 본문은 InvokeModel 또는에 대한 요청 body 필드에 전달됩니다InvokeModelWithResponseStream.

Request

Llama 2 Chat, Llama 2, Llama 3 Instruct, Llama 3.1 Instruct, 및 Llama 3.2 Instruct 모델에는 다음과 같은 추론 파라미터가 있습니다.


{
    "prompt": string,
    "temperature": float,
    "top_p": float,
    "max_gen_len": int
}

NOTE: Llama 3.2 모델은 문자열 목록인 요청 구조images에를 추가합니다. 예제: images: Optional[List[str]]

다음은 필수 파라미터입니다.

prompt - (필수) 모델에 전달하려는 프롬프트입니다. With Llama 2 Chat에서 다음 템플릿으로 대화의 형식을 지정합니다.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>

What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

<<SYS>> 토큰 간의 지침은 모델에 대한 시스템 프롬프트를 제공합니다. 다음은 시스템 프롬프트가 포함된 예제 프롬프트입니다.


<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

There's a llama in my garden  What should I do? [/INST]

추가 정보는 다음을 참조하세요.

다음 파라미터는 선택 사항입니다.

temperature - 낮은 값을 사용하면 응답의 무작위성을 줄일 수 있습니다.

기본값	최소	Maximum
0.5	0	1

top_p - 낮은 값을 사용하면 확률이 낮은 옵션을 무시할 수 있습니다. 비활성화하려면 0 또는 1.0으로 설정합니다.

기본값	최소	Maximum
0.9	0	1

max_gen_len - 생성된 응답에서 사용할 최대 토큰 수를 지정합니다. 생성된 텍스트가 max_gen_len을 초과하면 모델은 응답을 잘라냅니다.

기본값	최소	Maximum
512	1	2048

Response

Llama 2 Chat, Llama 2, 및 Llama 3 Instruct 모델은 텍스트 완료 추론 호출을 위해 다음 필드를 반환합니다.


{
    "generation": "\n\n<response>",
    "prompt_token_count": int,
    "generation_token_count": int,
    "stop_reason" : string
}

각 필드에 대한 자세한 내용은 아래에 나와 있습니다.

generation - 생성된 텍스트입니다.
prompt_token_count - 프롬프트의 토큰 수입니다.
generation_token_count - 생성된 텍스트의 토큰 수입니다.
stop_reason – 응답이 텍스트 생성을 중지한 이유입니다. 가능한 값은 다음과 같습니다.
- 중지 - 모델이 입력 프롬프트에 대한 텍스트 생성을 완료했습니다.
- 길이 - 생성된 텍스트의 토큰 길이가 InvokeModel(InvokeModelWithResponseStream, 출력을 스트리밍하는 경우)에 대한 호출에서 max_gen_len의 값을 초과합니다. 응답은 max_gen_len 토큰 수로 잘립니다. max_gen_len의 값을 높인 후에 다시 시도합니다.

예제 코드

이 예제에서는를 호출하는 방법을 보여줍니다. Meta Llama 2 Chat 13B 모델.


# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate text with Meta Llama 2 Chat (on demand).
"""

import json
import logging
import boto3


from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text(model_id, body):
    """
    Generate an image using Meta Llama 2 Chat on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (JSON): The text that the model generated, token information, and the
        reason the model stopped generating text.
    """

    logger.info("Generating image with Meta Llama 2 Chat model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    response = bedrock.invoke_model(
        body=body, modelId=model_id)

    response_body = json.loads(response.get('body').read())

    return response_body


def main():
    """
    Entrypoint for Meta Llama 2 Chat example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "meta.llama2-13b-chat-v1"
    prompt = """<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

There's a llama in my garden  What should I do? [/INST]"""
    max_gen_len = 128
    temperature = 0.1
    top_p = 0.9


    # Create request body.
    body = json.dumps({
        "prompt": prompt,
        "max_gen_len": max_gen_len,
        "temperature": temperature,
        "top_p": top_p
    })


    try:

        response = generate_text(model_id, body)

        print(f"Generated Text: {response['generation']}")
        print(f"Prompt Token count:  {response['prompt_token_count']}")
        print(f"Generation Token count:  {response['generation_token_count']}")
        print(f"Stop reason:  {response['stop_reason']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))

    else:
        print(
            f"Finished generating text with Meta Llama 2 Chat model {model_id}.")


if __name__ == "__main__":
    main()

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

AI21 Labs Jamba 모델

Mistral AI 모델