Foundation models available for fine-tuning Commonly supported fine-tuning hyperparameters

Foundation models and hyperparameters for fine-tuning

Foundation models are computationally expensive and trained on a large, unlabeled corpus. Fine-tuning a pre-trained foundation model is an affordable way to take advantage of their broad capabilities while customizing a model on your own small, corpus. Fine-tuning is a customization method that involved further training and does change the weights of your model.

Fine-tuning might be useful to you if you need:

to customize your model to specific business needs
your model to successfully work with domain-specific language, such as industry jargon, technical terms, or other specialized vocabulary
enhanced performance for specific tasks
accurate, relative, and context-aware responses in applications
responses that are more factual, less toxic, and better-aligned to specific requirements

There are two main approaches that you can take for fine-tuning depending on your use case and chosen foundation model.

If you're interested in fine-tuning your model on domain-specific data, see Fine-tune a large language model (LLM) using domain adaptation.
If you're interested in instruction-based fine-tuning using prompt and response examples, see Fine-tune a large language model (LLM) using prompt instructions.

Foundation models available for fine-tuning

You can fine-tune any of the following JumpStart foundation models:

Bloom 3B
Bloom 7B1
BloomZ 3B FP16
BloomZ 7B1 FP16
Code Llama 13B
Code Llama 13B Python
Code Llama 34B
Code Llama 34B Python
Code Llama 70B
Code Llama 70B Python
Code Llama 7B
Code Llama 7B Python
CyberAgentLM2-7B-Chat (CALM2-7B-Chat)
Falcon 40B BF16
Falcon 40B Instruct BF16
Falcon 7B BF16
Falcon 7B Instruct BF16
Flan-T5 Base
Flan-T5 Large
Flan-T5 Small
Flan-T5 XL
Flan-T5 XXL
Gemma 2B
Gemma 2B Instruct
Gemma 7B
Gemma 7B Instruct
GPT-2 XL
GPT-J 6B
GPT-Neo 1.3B
GPT-Neo 125M
GPT-NEO 2.7B
LightGPT Instruct 6B
Llama 2 13B
Llama 2 13B Chat
Llama 2 13B Neuron
Llama 2 70B
Llama 2 70B Chat
Llama 2 7B
Llama 2 7B Chat
Llama 2 7B Neuron
Mistral 7B
Mixtral 8x7B
Mixtral 8x7B Instruct
RedPajama INCITE Base 3B V1
RedPajama INCITE Base 7B V1
RedPajama INCITE Chat 3B V1
RedPajama INCITE Chat 7B V1
RedPajama INCITE Instruct 3B V1
RedPajama INCITE Instruct 7B V1
Stable Diffusion 2.1

Commonly supported fine-tuning hyperparameters

Different foundation models support different hyperparameters when fine-tuning. The following are commonly-supported hyperparameters that can further customize your model during training:

Inference Parameter	Description
`epoch`	The number of passes that the model takes through the fine-tuning dataset during training. Must be an integer greater than 1.
`learning_rate`	The rate at which the model weights are updated after working through each batch of fine-tuning training examples. Must be a positive float greater than 0.
`instruction_tuned`	Whether to instruction-train the model or not. Must be `'True'` or `'False'`.
`per_device_train_batch_size`	The batch size per GPU core or CPU for training. Must be a positive integer.
`per_device_eval_batch_size`	The batch size per GPU core or CPU for evaluation. Must be a positive integer.
`max_train_samples`	For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means that the model uses all of the training samples. Must be a positive integer or -1.
`max_val_samples`	For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means that the model uses all of the validation samples. Must be a positive integer or -1.
`max_input_length`	Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, `max_input_length` is set to the minimum of 1024 and the `model_max_length` defined by the tokenizer. If set to a positive value, `max_input_length` is set to the minimum of the provided value and the `model_max_length` defined by the tokenizer. Must be a positive integer or -1.
`validation_split_ratio`	If there is no validation channel, ratio of train-validation split from the training data. Must be between 0 and 1.
`train_data_split_seed`	If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the model. Must be an integer.
`preprocessing_num_workers`	The number of processes to use for the pre-processing. If `None`, main process is used for pre-processing.
`lora_r`	Low-rank adaptation (LoRA) r value, which acts as the scaling factor for weight updates. Must be a positive integer.
`lora_alpha`	Low-rank adaptation (LoRA) alpha value, which acts as the scaling factor for weight updates. Generally 2 to 4 times the size of `lora_r`. Must be a positive integer.
`lora_dropout`	Dropout value for low-rank adaptation (LoRA) layers Must be a positive float between 0 and 1.
`int8_quantization`	If `True`, model is loaded with 8 bit precision for training.
`enable_fsdp`	If `True`, training uses Fully Sharded Data Parallelism.

You can specify hyperparameter values when you fine-tune your model in Studio. For more information, see Fine-tune a model in Studio.

You can also override default hyperparameter values when fine-tuning your model using the SageMaker Python SDK. For more information, see Fine-tune publicly available foundation models with the JumpStartEstimator class.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Prompt engineering

Fine-tune a model using domain adaptation