Foundation models and hyperparameters for fine-tuning
Foundation models are computationally expensive and trained on a large, unlabeled corpus. Fine-tuning a pre-trained foundation model is an affordable way to take advantage of their broad capabilities while customizing a model on your own small, corpus. Fine-tuning is a customization method that involved further training and does change the weights of your model.
Fine-tuning might be useful to you if you need:
-
to customize your model to specific business needs
-
your model to successfully work with domain-specific language, such as industry jargon, technical terms, or other specialized vocabulary
-
enhanced performance for specific tasks
-
accurate, relative, and context-aware responses in applications
-
responses that are more factual, less toxic, and better-aligned to specific requirements
There are two main approaches that you can take for fine-tuning depending on your use case and chosen foundation model.
-
If you're interested in fine-tuning your model on domain-specific data, see Fine-tune a large language model (LLM) using domain adaptation.
-
If you're interested in instruction-based fine-tuning using prompt and response examples, see Fine-tune a large language model (LLM) using prompt instructions.
Foundation models available for fine-tuning
You can fine-tune any of the following JumpStart foundation models:
-
Bloom 3B
-
Bloom 7B1
-
BloomZ 3B FP16
-
BloomZ 7B1 FP16
-
Code Llama 13B
-
Code Llama 13B Python
-
Code Llama 34B
-
Code Llama 34B Python
-
Code Llama 70B
-
Code Llama 70B Python
-
Code Llama 7B
-
Code Llama 7B Python
-
CyberAgentLM2-7B-Chat (CALM2-7B-Chat)
-
Falcon 40B BF16
-
Falcon 40B Instruct BF16
-
Falcon 7B BF16
-
Falcon 7B Instruct BF16
-
Flan-T5 Base
-
Flan-T5 Large
-
Flan-T5 Small
-
Flan-T5 XL
-
Flan-T5 XXL
-
Gemma 2B
-
Gemma 2B Instruct
-
Gemma 7B
-
Gemma 7B Instruct
-
GPT-2 XL
-
GPT-J 6B
-
GPT-Neo 1.3B
-
GPT-Neo 125M
-
GPT-NEO 2.7B
-
LightGPT Instruct 6B
-
Llama 2 13B
-
Llama 2 13B Chat
-
Llama 2 13B Neuron
-
Llama 2 70B
-
Llama 2 70B Chat
-
Llama 2 7B
-
Llama 2 7B Chat
-
Llama 2 7B Neuron
-
Mistral 7B
-
Mixtral 8x7B
-
Mixtral 8x7B Instruct
-
RedPajama INCITE Base 3B V1
-
RedPajama INCITE Base 7B V1
-
RedPajama INCITE Chat 3B V1
-
RedPajama INCITE Chat 7B V1
-
RedPajama INCITE Instruct 3B V1
-
RedPajama INCITE Instruct 7B V1
-
Stable Diffusion 2.1
Commonly supported fine-tuning hyperparameters
Different foundation models support different hyperparameters when fine-tuning. The following are commonly-supported hyperparameters that can further customize your model during training:
Inference Parameter | Description |
---|---|
|
The number of passes that the model takes through the fine-tuning dataset during training. Must be an integer greater than 1. |
|
The rate at which the model weights are updated after working through each batch of fine-tuning training examples. Must be a positive float greater than 0. |
|
Whether to instruction-train the model or not. Must be
|
|
The batch size per GPU core or CPU for training. Must be a positive integer. |
|
The batch size per GPU core or CPU for evaluation. Must be a positive integer. |
|
For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means that the model uses all of the training samples. Must be a positive integer or -1. |
|
For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means that the model uses all of the validation samples. Must be a positive integer or -1. |
|
Maximum total input sequence length after tokenization.
Sequences longer than this will be truncated. If -1,
|
|
If there is no validation channel, ratio of train-validation split from the training data. Must be between 0 and 1. |
|
If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the model. Must be an integer. |
|
The number of processes to use for the pre-processing. If
|
|
Low-rank adaptation (LoRA) r value, which acts as the scaling factor for weight updates. Must be a positive integer. |
|
Low-rank adaptation (LoRA) alpha value, which acts as the
scaling factor for weight updates. Generally 2 to 4 times
the size of |
|
Dropout value for low-rank adaptation (LoRA) layers Must be a positive float between 0 and 1. |
|
If |
|
If |
You can specify hyperparameter values when you fine-tune your model in Studio. For more information, see Fine-tune a model in Studio.
You can also override default hyperparameter values when fine-tuning your model using the SageMaker Python SDK. For more information, see Fine-tune publicly available foundation models with the JumpStartEstimator class.