Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Stop harmful content in models using Amazon Bedrock Guardrails

Focus mode
Stop harmful content in models using Amazon Bedrock Guardrails - Amazon Bedrock

Amazon Bedrock Guardrails can implement safeguards for your generative AI applications based on your use cases and responsible AI policies. You can create multiple guardrails tailored to different use cases and apply them across multiple foundation models (FM), providing a consistent user experience and standardizing safety and privacy controls across generative AI applications. You can use guardrails for both user inputs and model responses with natural language.

Guardrails can be used in multiple ways to help safeguard generative AI applications. For example:

  • A chatbot application can use guardrails to help filter harmful user inputs and toxic model responses.

  • A banking application can use guardrails to help block user queries or model responses associated with seeking or providing investment advice.

  • A call center application to summarize conversation transcripts between users and agents can use guardrails to redact users’ personally identifiable information (PII) to protect user privacy.

Amazon Bedrock Guardrails supports the following policies:

  • Content filters – Adjust filter strengths to help block input prompts or model responses containing harmful content. Filtering is done based on detection of certain predefined harmful content categories - Hate, Insults, Sexual, Violence, Misconduct and Prompt Attack.

  • Denied topics – Define a set of topics that are undesirable in the context of your application. The filter will help block them if detected in user queries or model responses.

  • Word filters – Configure filters to help block undesirable words, phrases, and profanity (exact match). Such words can include offensive terms, competitor names, etc.

  • Sensitive information filters – Configure filters to help block or mask sensitive information, such as personally identifiable information (PII), or custom regex in user inputs and model responses. Blocking or masking is done based on probabilistic detection of sensitive information in standard formats in entities such as SSN number, Date of Birth, address, etc. This also allows configuring regular expression based detection of patterns for identifiers.

  • Contextual grounding check – Help detect and filter hallucinations in model responses based on grounding in a source and relevance to the user query.

  • Image content filter – Help detect and filter inappropriate or toxic image content. Users can set filters for specific categories and set filter strength.

In addition to the above policies, you can also configure the messages to be returned to the user if a user input or model response is in violation of the policies defined in the guardrail.

Experiment and benchmark with different configurations and use the built-in test window to ensure that the results meet your use-case requirements. When you create a guardrail, a working draft is automatically available for you to iteratively modify. Experiment with different configurations and use the built-in test window to see whether they are appropriate for your use-case. If you are satisfied with a set of configurations, you can create a version of the guardrail and use it with supported foundation models.

Guardrails can be used directly with FMs during the inference API invocation by specifying the guardrail ID and the version. Guardrails can also be used directly through the ApplyGuardrail API without invoking the foundation models. If a guardrail is used, it will evaluate the input prompts and the FM completions against the defined policies.

For retrieval augmented generation (RAG) or conversational applications, you may need to evaluate only the user input in the input prompt while discarding system instructions, search results, conversation history, or few short examples. To selectively evaluate a section of the input prompt, see Apply tags to user input to filter content.

Important
  • Amazon Bedrock Guardrails supports English, French, and Spanish in natural language. Guardrails will be ineffective with any other language.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.