Import a customized model into Amazon Bedrock - Amazon Bedrock

Import a customized model into Amazon Bedrock

You can create a custom model in Amazon Bedrock by using the Amazon Bedrock Custom Model Import feature to import Foundation Models that you have customized in other environments, such as Amazon SageMaker. For example, you might have a model that you have created in Amazon SageMaker that has proprietary model weights. You can now import that model into Amazon Bedrock and then leverage Amazon Bedrock features to make inference calls to the model.

You can use a model that you import with on demand throughput. Use the InvokeModel or InvokeModelWithResponseStream operations to make inference calls to the model. For more information, see Submit a single prompt with InvokeModel.

Amazon Bedrock Custom Model Import is supported in the following Regions (for more information about Regions supported in Amazon Bedrock see Amazon Bedrock endpoints and quotas):

  • US East (N. Virginia)

  • US West (Oregon)

Note

Make sure that your import and use of the models in Amazon Bedrock complies with the terms or licenses applicable to the models.

You can't use Custom Model Import with the following Amazon Bedrock features.

  • Batch inference

  • AWS CloudFormation

With Custom Model Import you can create a custom model that supports the following patterns.

  • Fine-tuned or Continued Pre-training model — You can customize the model weights using proprietary data, but retain the configuration of the base model.

  • Adaptation You can customize the model to your domain for use cases where the model doesn't generalize well. Domain adaptation modifies a model to generalize for a target domain and deal with discrepancies across domains, such as a financial industry wanting to create a model which generalizes well on pricing. Another example is language adaptation. For example you could customize a model to generate responses in Portuguese or Tamil. Most often, this involves changes to the vocabulary of the model that you are using.

  • Pretrained from scratch — In addition to customizing the weights and vocabulary of the model, you can also change model configuration parameters such as the number of attention heads, hidden layers, or context length.

Supported architectures

The model you import must be in one of the following architectures.

  • Mistral — A decoder-only Transformer based architecture with Sliding Window Attention (SWA) and options for Grouped Query Attention (GQA). For more information, see Mistral in the Hugging Face documentation.

  • Mixtral — A decoder-only transformer model with sparse Mixture of Experts (MoE) models. For more information, see Mixtral in the Hugging Face documentation.

  • Flan — An enhanced version of the T5 architecture, an encoder-decoder based transformer model. For more information, see Flan T5 in the Hugging Face documentation.

  • Llama 2, Llama3, Llama3.1, and Llama3.2 — An improved version of Llama with Grouped Query Attention (GQA). For more information, see Llama 2, Llama 3, Llama 3.1, and Llama 3.2 in the Hugging Face documentation.

Note
  • The size of the imported model weights must be less than 100GB for multimodal models and 200GB for text models.

  • Amazon Bedrock supports transformer version 4.45.2. Make sure that you are using transformer version 4.45.2 when you fine tune your model.

Import source

You import a model into Amazon Bedrock by creating a model import job in the Amazon Bedrock console or API. In the job you specify the Amazon S3 URI for the source of the model files. Alternatively, if you created the model in Amazon SageMaker, you can specify the SageMaker model. During model training, the import job automatically detects your model's architecture.

If you import from an Amazon S3 bucket, you need to supply the model files in the Hugging Face weights format. You can create the files by using the Hugging Face transformer library. To create model files for a Llama model, see convert_llama_weights_to_hf.py. To create the files for a Mistral AI model, see convert_mistral_weights_to_hf.py.

To import the model from Amazon S3, you minimally need the following files that the Hugging Face transformer library creates.

  • .safetensor — the model weights in Safetensor format. Safetensors is a format created by Hugging Face that stores a model weights as tensors. You must store the tensors for your model in a file with the extension .safetensors. For more information, see Safetensors. For information about converting model weights to Safetensor format, see Convert weights to safetensors.

    Note
    • Currently, Amazon Bedrock only supports model weights with FP32, FP16, and BF16 precision. Amazon Bedrock will reject model weights if you supply them with any other precision. Internally Amazon Bedrock will convert FP32 models to BF16 precision.

    • Amazon Bedrock doesn't support the import of quantized models.

  • config.json — For examples, see LlamaConfig and MistralConfig.

    Note

    Amazon Bedrock overrides llama3 rope_scaling value with the following values:

    • original_max_position_embeddings=8192

    • high_freq_factor=4

    • low_freq_factor=1

    • factor=8

  • tokenizer_config.json For an example, see LlamaTokenizer.

  • tokenizer.json

  • tokenizer.model

Supported tokenizers

Amazon Bedrock Custom Model Import supports the following tokenizers. You can use these tokenizers with any model.

  • T5Tokenizer

  • T5TokenizerFast

  • LlamaTokenizer

  • LlamaTokenizerFast

  • CodeLlamaTokenizer

  • CodeLlamaTokenizerFast

  • GPT2Tokenizer

  • GPT2TokenizerFast

  • GPTNeoXTokenizer

  • GPTNeoXTokenizerFast

  • PreTrainedTokenizer

  • PreTrainedTokenizerFast