Meta Llama model

Bagian ini menjelaskan parameter permintaan dan bidang respons untuk Meta Llama model. Gunakan informasi ini untuk membuat panggilan inferensi ke Meta Llama model dengan operasi InvokeModeldan InvokeModelWithResponseStream(streaming). Bagian ini juga mencakup Python contoh kode yang menunjukkan cara menelepon Meta Llama model. Untuk menggunakan model dalam operasi inferensi, Anda memerlukan ID model untuk model tersebut. Untuk mendapatkan ID model, lihatModel pondasi yang didukung di Amazon Bedrock. Beberapa model juga bekerja dengan Converse API. Untuk memeriksa apakah Converse APImendukung spesifik Meta Llama model, lihatModel dan fitur model yang didukung. Untuk contoh kode lainnya, lihatContoh kode untuk Amazon Bedrock menggunakan AWS SDKs.

Model foundation di Amazon Bedrock mendukung modalitas input dan output, yang bervariasi dari model ke model. Untuk memeriksa modalitas yang Meta Llama dukungan model, lihatModel pondasi yang didukung di Amazon Bedrock. Untuk memeriksa fitur Amazon Bedrock mana Meta Llama dukungan model, lihatModel pondasi yang didukung di Amazon Bedrock. Untuk memeriksa AWS daerah mana yang Meta Llama model tersedia di, lihatModel pondasi yang didukung di Amazon Bedrock.

Saat Anda membuat panggilan inferensi dengan Meta Llama model, Anda menyertakan prompt untuk model. Untuk informasi umum tentang membuat prompt untuk model yang didukung Amazon Bedrock, lihat. Konsep rekayasa yang cepat Untuk Meta Llama informasi prompt spesifik, lihat Meta Llama panduan teknik yang cepat.

catatan

Llama 3.2 Instruct and Llama 3.3 Instruct model menggunakan geofencing. Ini berarti bahwa model ini tidak dapat digunakan di luar AWS Wilayah yang tersedia untuk model ini yang tercantum dalam tabel Wilayah.

Bagian ini memberikan informasi untuk menggunakan model berikut dari Meta.

Llama 3 Instruct
Llama 3.1 Instruct
Llama 3.2 Instruct
Llama 3.3 Instruct

Permintaan dan tanggapan

Badan permintaan diteruskan di body bidang permintaan ke InvokeModelatau InvokeModelWithResponseStream.

Request

Llama 2 Chat, Llama 2, Llama 3 Instruct, Llama 3.1 Instruct, dan Llama 3.2 Instruct model memiliki parameter inferensi berikut.


{
    "prompt": string,
    "temperature": float,
    "top_p": float,
    "max_gen_len": int
}

NOTE: Model Llama 3.2 images menambah struktur permintaan, yang merupakan daftar string. Contoh: images: Optional[List[str]]

Berikut ini adalah parameter yang diperlukan.

prompt - (Wajib) Prompt yang ingin Anda lewatkan ke model. Dengan Llama 2 Chat, format percakapan dengan template berikut.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>

What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Instruksi antara <<SYS>> token menyediakan prompt sistem untuk model. Berikut ini adalah contoh prompt yang mencakup prompt sistem.


<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

There's a llama in my garden  What should I do? [/INST]

Untuk informasi selengkapnya, lihat hal berikut.

Berikut ini adalah parameter opsional.

suhu — Gunakan nilai yang lebih rendah untuk mengurangi keacakan dalam respons.

Default	Minimum	Maksimum
0,5	0	1

top_p — Gunakan nilai yang lebih rendah untuk mengabaikan opsi yang kurang mungkin. Setel ke 0 atau 1.0 untuk menonaktifkan.

Default	Minimum	Maksimum
0,9	0	1

max_gen_len — Tentukan jumlah maksimum token yang akan digunakan dalam respons yang dihasilkan. Model memotong respons setelah teks yang dihasilkan melebihi. max_gen_len

Default	Minimum	Maksimum
512	1	2048

Response

Llama 2 Chat, Llama 2, dan Llama 3 Instruct model mengembalikan bidang berikut untuk panggilan inferensi penyelesaian teks.


{
    "generation": "\n\n<response>",
    "prompt_token_count": int,
    "generation_token_count": int,
    "stop_reason" : string
}

Informasi lebih lanjut tentang setiap bidang disediakan di bawah ini.

Generasi — Teks yang dihasilkan.
prompt_token_count — Jumlah token dalam prompt.
generation_token_count — Jumlah token dalam teks yang dihasilkan.
stop_reason — Alasan mengapa respons berhenti menghasilkan teks. Kemungkinan nilainya adalah:
- stop — Model telah selesai menghasilkan teks untuk prompt input.
- panjang — Panjang token untuk teks yang dihasilkan melebihi nilai max_gen_len dalam panggilan ke InvokeModel (InvokeModelWithResponseStream, jika Anda streaming output). Respons terpotong menjadi token. max_gen_len Pertimbangkan untuk meningkatkan nilai max_gen_len dan mencoba lagi.

Contoh kode

Contoh ini menunjukkan cara memanggil Meta Llama 2 Chat Model 13B.


# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate text with Meta Llama 2 Chat (on demand).
"""

import json
import logging
import boto3


from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text(model_id, body):
    """
    Generate an image using Meta Llama 2 Chat on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (JSON): The text that the model generated, token information, and the
        reason the model stopped generating text.
    """

    logger.info("Generating image with Meta Llama 2 Chat model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    response = bedrock.invoke_model(
        body=body, modelId=model_id)

    response_body = json.loads(response.get('body').read())

    return response_body


def main():
    """
    Entrypoint for Meta Llama 2 Chat example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "meta.llama2-13b-chat-v1"
    prompt = """<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

There's a llama in my garden  What should I do? [/INST]"""
    max_gen_len = 128
    temperature = 0.1
    top_p = 0.9


    # Create request body.
    body = json.dumps({
        "prompt": prompt,
        "max_gen_len": max_gen_len,
        "temperature": temperature,
        "top_p": top_p
    })


    try:

        response = generate_text(model_id, body)

        print(f"Generated Text: {response['generation']}")
        print(f"Prompt Token count:  {response['prompt_token_count']}")
        print(f"Generation Token count:  {response['generation_token_count']}")
        print(f"Stop reason:  {response['stop_reason']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))

    else:
        print(
            f"Finished generating text with Meta Llama 2 Chat model {model_id}.")


if __name__ == "__main__":
    main()

Awas Javascript dinonaktifkan atau tidak tersedia di browser Anda.

Untuk menggunakan Dokumentasi AWS, Javascript harus diaktifkan. Lihat halaman Bantuan browser Anda untuk petunjuk.

Konvensi Dokumen

AI21 Labs Model Jamba

Mistral AI model