Supported large language models for fine-tuning

Focus mode

Supported large language models for fine-tuning - Amazon SageMaker AI

Using Autopilot API, users can fine-tune large language models (LLMs) that are powered by Amazon SageMaker JumpStart.

Note

For fine-tuning models that require the acceptance of an end-user license agreement, you must explicitly declare EULA acceptance when creating your AutoML job. Note that after fine-tuning a pretrained model, the weights of the original model are changed, so you do not need to later accept a EULA when deploying the fine-tuned model.

For information on how to accept the EULA when creating a fine-tuning job using the AutoML API, see How to set the EULA acceptance when fine-tuning a model using the AutoML API.

You can find the full details of each model by searching for your JumpStart Model ID in the following model table, and then following the link in the Source column. These details might include the languages supported by the model, biases it may exhibit, the datasets employed for fine-tuning, and more.

The following table lists the supported JumpStart models that you can fine-tune with an AutoML job.

JumpStart Model ID	`BaseModelName` in API request	Description
huggingface-textgeneration-dolly-v2-3b-bf16	`Dolly3B`	Dolly 3B is a 2.8 billion parameter instruction-following large language model based on pythia-2.8b. It is trained on the instruction/response fine tuning dataset databricks-dolly-15k and can perform tasks including brainstorming, classification, questions and answers, text generation, information extraction, and summarization.
huggingface-textgeneration-dolly-v2-7b-bf16	`Dolly7B`	Dolly 7B is a 6.9 billion parameter instruction-following large language model based on pythia-6.9b. It is trained on the instruction/response fine tuning dataset databricks-dolly-15k and can perform tasks including brainstorming, classification, questions and answers, text generation, information extraction, and summarization.
huggingface-textgeneration-dolly-v2-12b-bf16	`Dolly12B`	Dolly 12B is a 12 billion parameter instruction-following large language model based on pythia-12b. It is trained on the instruction/response fine tuning dataset databricks-dolly-15k and can perform tasks including brainstorming, classification, questions and answers, text generation, information extraction, and summarization.
huggingface-llm-falcon-7b-bf16	`Falcon7B`	Falcon 7B is a 7 billion parameter causal large language model trained on 1,500 billion tokens enhanced with curated corpora. Falcon-7B is trained on English and French data only, and does not generalize appropriately to other languages. Because the model was trained on large amounts of web data, it carries the stereotypes and biases commonly found online.
huggingface-llm-falcon-7b-instruct-bf16	`Falcon7BInstruct`	Falcon 7B Instruct is a 7 billion parameter causal large language model built on Falcon 7B and fine-tuned on a 250 million tokens mixture of chat/instruct datasets. Falcon 7B Instruct is mostly trained on English data, and does not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it carries the stereotypes and biases commonly encountered online.
huggingface-llm-falcon-40b-bf16	`Falcon40B`	Falcon 40B is a 40 billion parameter causal large language model trained on 1,000 billion tokens enhanced with curated corpora. It is trained mostly on English, German, Spanish, and French, with limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, and Swedish. It does not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it carries the stereotypes and biases commonly encountered online.
huggingface-llm-falcon-40b-instruct-bf16	`Falcon40BInstruct`	Falcon 40B Instruct is a 40 billion parameter causal large language model built on Falcon40B and fine-tuned on a mixture of Baize. It is mostly trained on English and French data, and does not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it carries the stereotypes and biases commonly encountered online.
huggingface-text2text-flan-t5-large	`FlanT5L`	The Flan-T5 model family is a set of large language models that are fine-tuned on multiple tasks and can be further trained. These models are well-suited for tasks such as language translation, text generation, sentence completion, word sense disambiguation, summarization, or question answering. Flan T5 L is a 780 million parameter large language model trained on numerous languages. You can find the list of the languages supported by Flan T5 L in the details of the model retrieved from your search by model ID in JumpStart's model table.
huggingface-text2text-flan-t5-xl	`FlanT5XL`	The Flan-T5 model family is a set of large language models that are fine-tuned on multiple tasks and can be further trained. These models are well-suited for tasks such as language translation, text generation, sentence completion, word sense disambiguation, summarization, or question answering. Flan T5 XL is a 3 billion parameter large language model trained on numerous languages. You can find the list of the languages supported by Flan T5 XL in the details of the model retrieved from your search by model ID in JumpStart's model table.
huggingface-text2text-flan-t5-xxll	`FlanT5XXL`	The Flan-T5 model family is a set of large language models that are fine-tuned on multiple tasks and can be further trained. These models are well-suited for tasks such as language translation, text generation, sentence completion, word sense disambiguation, summarization, or question answering. Flan T5 XXL is a 11 billion parameter model. You can find the list of the languages supported by Flan T5 XXL in the details of the model retrieved from your search by model ID in JumpStart's model table.
meta-textgeneration-llama-2-7b	`Llama2-7B`	Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging in scale from 7 billion to 70 billion parameters. Llama2-7B is the 7 billion parameter model that is intended for English use and can be adapted for a variety of natural language generation tasks.
meta-textgeneration-llama-2-7b-f	`Llama2-7BChat`	Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging in scale from 7 billion to 70 billion parameters. Llama2-7B is the 7 billion parameter chat model that is optimized for dialogue use cases.
meta-textgeneration-llama-2-13b	`Llama2-13B`	Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging in scale from 7 billion to 70 billion parameters. Llama2-13B is the 13 billion parameter model that is intended for English use and can be adapted for a variety of natural language generation tasks.
meta-textgeneration-llama-2-13b-f	`Llama2-13BChat`	Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging in scale from 7 billion to 70 billion parameters. Llama2-13B is the 13 billion parameter chat model that is optimized for dialogue use cases.
huggingface-llm-mistral-7b	`Mistral7B`	Mistral 7B is a seven billion parameters code and general purpose English text generation model. It can be used in a variety of use cases including text summarization, classification, text completion, or code completion.
huggingface-llm-mistral-7b-instruct	`Mistral7BInstruct`	Mistral 7B Instruct is the fine-tuned version of Mistral 7B for conversational use cases. It was specialized using a variety of publicly available conversation datasets in English.
huggingface-textgeneration1-mpt-7b-bf16	`MPT7B`	MPT 7B is a decoder-style transformer large language model with 6.7 billion parameters, pre-trained from scratch on 1 trillion tokens of English text and code. It is prepared to handle long context lengths.
huggingface-textgeneration1-mpt-7b-instruct-bf16	`MPT7BInstruct`	MPT 7B Instruct is a model for short-form instruction following tasks. It is built by fine-tuning MPT 7B on a dataset derived from databricks-dolly-15k and the Anthropic Helpful and Harmless (HH-RLHF) datasets.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Create an LLM fine-tuning job using the AutoML API

Dataset file types and input data format

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Supported large language models for fine-tuning

Note

Related resources

Did this page help you?

Related resources

Next topic:

Previous topic:

Need help?