AI21 Labs Jamba models - Amazon Bedrock

AI21 Labs Jamba models

This section provides inference parameters and a code example for using AI21 Labs Jamba models.

Required fields

The AI21 Labs Jamba models supports the following required fields:

  • Messages (messages) – The previous messages in this chat, from oldest (index 0) to newest. Must have at least one user or assistant message in the list. Include both user inputs and system responses. Maximum total size for the list is about 256K tokens. Each message includes the following members:

  • Role (role) – The role of the message author. One of the following values:

    • User (user) – Input provided by the user. Any instructions given here that conflict with instructions given in the system prompt take precedence over the system prompt instructions.

    • Assistant (assistant) – Response generated by the model.

    • System (system) – Initial instructions provided to the system to provide general guidance on the tone and voice of the generated message. An initial system message is optional but recommended to provide guidance on the tone of the chat. For example, "You are a helpful chatbot with a background in earth sciences and a charming French accent."

  • Content (content) – The content of the message.

Inference parameters

The AI21 Labs Jamba models support the following inference parameters.

Randomness and Diversity

The AI21 Labs Jamba models support the following parameters to control randomness and diversity in the response.

  • Temperature (temperature)– How much variation to provide in each answer. Setting this value to 0 guarantees the same response to the same question every time. Setting a higher value encourages more variation. Modifies the distribution from which tokens are sampled. Default: 1.0, Range: 0.0 – 2.0

  • Top P (top_p) – Limit the pool of next tokens in each step to the top N percentile of possible tokens, where 1.0 means the pool of all possible tokens, and 0.01 means the pool of only the most likely next tokens.

Length

The AI21 Labs Jamba models support the following parameters to control the length of the generated response.

  • Max completion length (max_tokens) – The maximum number of tokens to allow for each generated response message. Typically the best way to limit output length is by providing a length limit in the system prompt (for example, "limit your answers to three sentences"). Default: 4096, Range: 0 – 4096.

  • Stop sequences (stop) – End the message when the model generates one of these strings. The stop sequence is not included in the generated message. Each sequence can be up to 64K long, and can contain newlines as \n characters.

    Examples:

    • Single stop string with a word and a period: "monkeys."

    • Multiple stop strings and a newline: ["cat", "dog", " .", "####", "\n"]

  • Number of responses (n) – How many chat responses to generate. Notes n must be 1 for streaming responses. If n is set to larger than 1, setting temperature=0 will always fail because all answers are guaranteed to be duplicates. Default:1, Range: 1 – 16

Repetitions

The AI21 Labs Jamba models support the following parameters to control repetition in the generated response.

  • Frequency Penalty (frequency_penalty) – Reduce frequency of repeated words within a single response message by increasing this number. This penalty gradually increases the more times a word appears during response generation. Setting to 2.0 will produce a string with few, if any repeated words.

  • Presence Penalty (presence_penalty) – Reduce the frequency of repeated words within a single message by increasing this number. Unlike frequency penalty, presence penalty is the same no matter how many times a word appears.

Model invocation request body field

When you make an InvokeModel or InvokeModelWithResponseStream call using an AI21 Labs model, fill the body field with a JSON object that conforms to the one below. Enter the prompt in the prompt field.

{ "messages": [ { "role":"system", // Non-printing contextual information for the model "content":"You are a helpful history teacher. You are kind and you respond with helpful content in a professional manner. Limit your answers to three sentences. Your listener is a high school student." }, { "role":"user", // The question we want answered. "content":"Who was the first emperor of rome?" } ], "n":1 // Limit response to one answer }

Model invocation response body field

For information about the format of the body field in the response, see https://docs.ai21.com/reference/jamba-instruct-api#response-details.

Code example

This example shows how to call the AI21 Labs Jamba-Instruct model.

invoke_model

import boto3 import json bedrock = session.client('bedrock-runtime', 'us-east-1') response = bedrock.invoke_model( modelId='ai21.jamba-instruct-v1:0', body=json.dumps({ 'messages': [ { 'role': 'user', 'content': 'which llm are you?' } ], }) ) print(json.dumps(json.loads(response['body']), indent=4))

converse

import boto3 import json bedrock = session.client('bedrock-runtime', 'us-east-1') response = bedrock.converse( modelId='ai21.jamba-instruct-v1:0', messages=[ { 'role': 'user', 'content': [ { 'text': 'which llm are you?' } ] } ] ) print(json.dumps(json.loads(response['body']), indent=4))

Code example for Jamba 1.5 Large

This example shows how to call the AI21 Labs Jamba 1.5 Large model.

invoke_model

POST https://bedrock-runtime.us-east-1.amazonaws.com/model/ai21.jamba-1-5-mini-v1:0/invoke-model HTTP/1.1 { "messages": [ { "role": "system", "content": "You are a helpful chatbot with a background in earth sciences and a charming French accent." }, { "role": "user", "content": "What are the main causes of earthquakes?" } ], "max_tokens": 512, "temperature": 0.7, "top_p": 0.9, "stop": ["###"], "n": 1 }