Supported large language models for fine-tuning
Using Autopilot API, users can fine-tune large language models (LLMs) that are powered by Amazon SageMaker JumpStart.
Note
For fine-tuning models that require the acceptance of an end-user license agreement, you must explicitly declare EULA acceptance when creating your AutoML job. Note that after fine-tuning a pretrained model, the weights of the original model are changed, so you do not need to later accept a EULA when deploying the fine-tuned model.
For information on how to accept the EULA when creating a fine-tuning job using the AutoML API, see How to set the EULA acceptance when fine-tuning a model using the AutoML API.
You can find the full details of each model by searching for your JumpStart Model ID in the following model table
The following table lists the supported JumpStart models that you can fine-tune with an AutoML job.
JumpStart Model ID | BaseModelName in API request |
Description |
---|---|---|
huggingface-textgeneration-dolly-v2-3b-bf16 | Dolly3B |
Dolly 3B is a 2.8 billion parameter instruction-following large language model
based on pythia-2.8b |
huggingface-textgeneration-dolly-v2-7b-bf16 | Dolly7B |
Dolly 7B is a 6.9 billion parameter instruction-following large language model
based on pythia-6.9b |
huggingface-textgeneration-dolly-v2-12b-bf16 | Dolly12B |
Dolly 12B is a 12 billion parameter instruction-following large language model
based on pythia-12b |
huggingface-llm-falcon-7b-bf16 | Falcon7B |
Falcon 7B is a 7 billion parameter causal large language model trained on 1,500 billion tokens enhanced with curated corpora. Falcon-7B is trained on English and French data only, and does not generalize appropriately to other languages. Because the model was trained on large amounts of web data, it carries the stereotypes and biases commonly found online. |
huggingface-llm-falcon-7b-instruct-bf16 | Falcon7BInstruct |
Falcon 7B Instruct is a 7 billion parameter causal large language model built on Falcon 7B and fine-tuned on a 250 million tokens mixture of chat/instruct datasets. Falcon 7B Instruct is mostly trained on English data, and does not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it carries the stereotypes and biases commonly encountered online. |
huggingface-llm-falcon-40b-bf16 | Falcon40B |
Falcon 40B is a 40 billion parameter causal large language model trained on 1,000 billion tokens enhanced with curated corpora. It is trained mostly on English, German, Spanish, and French, with limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, and Swedish. It does not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it carries the stereotypes and biases commonly encountered online. |
huggingface-llm-falcon-40b-instruct-bf16 | Falcon40BInstruct |
Falcon 40B Instruct is a 40 billion parameter causal large language model built on Falcon40B and fine-tuned on a mixture of Baize. It is mostly trained on English and French data, and does not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it carries the stereotypes and biases commonly encountered online. |
huggingface-text2text-flan-t5-large | FlanT5L |
The Flan-T5 |
huggingface-text2text-flan-t5-xl | FlanT5XL |
The Flan-T5 |
huggingface-text2text-flan-t5-xxll | FlanT5XXL |
The Flan-T5 |
meta-textgeneration-llama-2-7b | Llama2-7B |
Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging in scale from 7 billion to 70 billion parameters. Llama2-7B is the 7 billion parameter model that is intended for English use and can be adapted for a variety of natural language generation tasks. |
meta-textgeneration-llama-2-7b-f | Llama2-7BChat |
Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging in scale from 7 billion to 70 billion parameters. Llama2-7B is the 7 billion parameter chat model that is optimized for dialogue use cases. |
meta-textgeneration-llama-2-13b | Llama2-13B |
Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging in scale from 7 billion to 70 billion parameters. Llama2-13B is the 13 billion parameter model that is intended for English use and can be adapted for a variety of natural language generation tasks. |
meta-textgeneration-llama-2-13b-f | Llama2-13BChat |
Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging in scale from 7 billion to 70 billion parameters. Llama2-13B is the 13 billion parameter chat model that is optimized for dialogue use cases. |
huggingface-llm-mistral-7b | Mistral7B |
Mistral 7B is a seven billion parameters code and general purpose English text generation model. It can be used in a variety of use cases including text summarization, classification, text completion, or code completion. |
huggingface-llm-mistral-7b-instruct | Mistral7BInstruct |
Mistral 7B Instruct is the fine-tuned version of Mistral 7B for conversational use cases. It was specialized using a variety of publicly available conversation datasets in English. |
huggingface-textgeneration1-mpt-7b-bf16 | MPT7B |
MPT 7B is a decoder-style transformer large language model with 6.7 billion parameters, pre-trained from scratch on 1 trillion tokens of English text and code. It is prepared to handle long context lengths. |
huggingface-textgeneration1-mpt-7b-instruct-bf16 | MPT7BInstruct |
MPT 7B Instruct is a model for short-form instruction following tasks. It is
built by fine-tuning MPT 7B on a dataset derived from databricks-dolly-15k |