Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Build your own RAG

Focus mode
Build your own RAG - Amazon Nova

When constructing your own retrieval augmented generation (RAG) system, you can leverage a retriever system and a generator system. The retriever can be an embedding model that identifies the relevant chunks from the vector database based on similarity scores. The generator can be a Large Language Model (LLM) that utilizes the model's capability to answer questions based on the retrieved results (also known as chunks). In the following sections, we will provide additional tips on how to optimize the prompts for your RAG system.

Tip

Leverage the System Prompts: As with other functionalities, enhancing the system prompt can be beneficial. You can define the RAG Systems description in the system prompt, outlining the desired persona and behavior for the model.

Tip

Use Model Instructions: Additionally, you can include a dedicated "Model Instructions:" section within the system prompt, where you can provide specific guidelines for the model to follow.

For instance, you can list instructions such as: In this example session, the model has access to search results and a user's question, its job is to answer the user's question using only information from the search results.

Model Instructions: - You should provide concise answer to simple questions when the answer is directly contained in search results, but when comes to yes/no question, provide some details. - In case the question requires multi-hop reasoning, you should find relevant information from search results and summarize the answer based on relevant information with logical reasoning. - If the search results do not contain information that can answer the question, please state that you could not find an exact answer to the question, and if search results are completely irrelevant, say that you could not find an exact answer, then summarize search results. - Remember to add citations to your response using markers like %[1]%, %[2]%, %[3]%, etc for the corresponding passage supports the response.
Tip

Avoid Hallucination by restricting the instructions: Bring more focus to instructions by clearly mentioning "DO NOT USE INFORMATION THAT IS NOT IN SEARCH RESULTS!" as a model instruction so the answers are grounded in the provided context.

- DO NOT USE INFORMATION THAT IS NOT IN SEARCH RESULTS!
Tip

Provide an input query followed by search results: Provide an input query followed by the retriever search results or contextual chunks. The model works best when the chunk results are provided after Resource: Search Results:

{query} Resource: Search Results: {rag_chunks_retreiver_results}

You can combine all of the previous recommendations with the following prompt template. This template will only generate based on retrieved chunks.

In this session, the model has access to search results and a user's question, your job is to answer the user's question using only information from the search results. Model Instructions: - You should provide concise answer to simple questions when the answer is directly contained in search results, but when comes to yes/no question, provide some details. - In case the question requires multi-hop reasoning, you should find relevant information from search results and summarize the answer based on relevant information with logical reasoning. - If the search results do not contain information that can answer the question, please state that you could not find an exact answer to the question, and if search results are completely irrelevant, say that you could not find an exact answer, then summarize search results. - Remember to add a citation to the end of your response using markers like %[1]%, %[2]%, %[3]%, etc for the corresponding passage supports the response. - DO NOT USE INFORMATION THAT IS NOT IN SEARCH RESULTS! {Query} Resource: {search_results}

Multimodal RAG

When you create a multimodal RAG, there are a few additional best practices you should observe.

  • Use images directly if they are not text-heavy (that is, natural scenes, text-sparse slides, infographics, and so on) Amazon Nova has been optimized to handle non-text-heavy images. You do not need to pass an additional text summary for these images in the grounded generation.

  • Enhance text-heavy images with text summaries (e.g., PDF reports, papers). For text-heavy PDFs, the best approach is to retrieve both images (PDFs) and corresponding text summaries. The text summaries can help the model to identify relevant information from massive amounts of text in the original image.

  • Let the model know that you are passing images. In the instructions, you can add a sentence like "You will be provided with images and texts from search results".

On this page

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.