Meta Llama models
This section describes the request parameters and response fields for Meta Llama models. Use this information to make inference calls to Meta Llama models with the InvokeModel and InvokeModelWithResponseStream (streaming) operations. This section also includes Python code examples that shows how to call Meta Llama models. To use a model in an inference operation, you need the model ID for the model. To get the model ID, see Supported foundation models in Amazon Bedrock. Some models also work with the Converse API. To check if the Converse API supports a specific Meta Llama model, see Supported models and model features. For more code examples, see Code examples for Amazon Bedrock using AWS SDKs.
Foundation models in Amazon Bedrock support input and output modalities, which vary from model to model. To check the modalities that Meta Llama models support, see Supported foundation models in Amazon Bedrock. To check which Amazon Bedrock features the Meta Llama models support, see Supported foundation models in Amazon Bedrock. To check which AWS Regions that Meta Llama models are available in, see Supported foundation models in Amazon Bedrock.
When you make inference calls with Meta Llama models, you include a prompt for the model. For general information
about creating prompts for the models that Amazon Bedrock supports, see
Prompt engineering concepts.
For Meta Llama specific prompt information, see the Meta Llama prompt engineering guide
Note
Llama 3.2 Instruct models use geofencing. This means that these models cannot be used outside the AWS Regions available for these models listed in the Regions table.
This section provides information for using the following models from Meta.
Llama 2
Llama 2 Chat
Llama 3 Instruct
Llama 3.1 Instruct
Llama 3.2 Instruct
Request and response
The request body is passed in the body
field of a request to
InvokeModel or InvokeModelWithResponseStream.
Example code
This example shows how to call the Meta Llama 2 Chat 13B model.
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. # SPDX-License-Identifier: Apache-2.0 """ Shows how to generate text with Meta Llama 2 Chat (on demand). """ import json import logging import boto3 from botocore.exceptions import ClientError logger = logging.getLogger(__name__) logging.basicConfig(level=logging.INFO) def generate_text(model_id, body): """ Generate an image using Meta Llama 2 Chat on demand. Args: model_id (str): The model ID to use. body (str) : The request body to use. Returns: response (JSON): The text that the model generated, token information, and the reason the model stopped generating text. """ logger.info("Generating image with Meta Llama 2 Chat model %s", model_id) bedrock = boto3.client(service_name='bedrock-runtime') response = bedrock.invoke_model( body=body, modelId=model_id) response_body = json.loads(response.get('body').read()) return response_body def main(): """ Entrypoint for Meta Llama 2 Chat example. """ logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s") model_id = "meta.llama2-13b-chat-v1" prompt = """<s>[INST] <<SYS>> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. <</SYS>> There's a llama in my garden What should I do? [/INST]""" max_gen_len = 128 temperature = 0.1 top_p = 0.9 # Create request body. body = json.dumps({ "prompt": prompt, "max_gen_len": max_gen_len, "temperature": temperature, "top_p": top_p }) try: response = generate_text(model_id, body) print(f"Generated Text: {response['generation']}") print(f"Prompt Token count: {response['prompt_token_count']}") print(f"Generation Token count: {response['generation_token_count']}") print(f"Stop reason: {response['stop_reason']}") except ClientError as err: message = err.response["Error"]["Message"] logger.error("A client error occurred: %s", message) print("A client error occured: " + format(message)) else: print( f"Finished generating text with Meta Llama 2 Chat model {model_id}.") if __name__ == "__main__": main()