Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Block harmful words and conversations with content filters

Focus mode
Block harmful words and conversations with content filters - Amazon Bedrock

Amazon Bedrock Guardrails supports content filters to help detect and filter harmful user inputs and model-generated outputs in natural language. Content filters are supported across the following categories:

Hate

  • Text content — Describes input prompts and model responses that discriminate, criticize, insult, denounce, or dehumanize a person or group on the basis of an identity (such as race, ethnicity, gender, religion, sexual orientation, ability, and national origin).

  • Image content (in preview) — Describes input prompts and model responses that includes graphic and real-life visual content displaying certain symbols of hate groups, hateful symbols, and imagery associated with various organizations promoting discrimination, racism, and intolerance.

Insults

  • Text content — Describes input prompts and model responses that includes demeaning, humiliating, mocking, insulting, or belittling language. This type of language is also labeled as bullying.

  • Image content (in preview) — Describes input prompts and model responses that encompasses various forms of rude, disrespectful, or offensive gestures intended to express contempt, anger, or disapproval.

Sexual

  • Text content — Describes input prompts and model responses that indicates sexual interest, activity, or arousal using direct or indirect references to body parts, physical traits, or sex.

  • Image content (in preview) — Describes input prompts and model responses that display private body parts or sexual activity. This category also encompasses cartoons, animé, drawings, sketches, and other illustrated content with sexual themes.

Violence

  • Text content — Describes input prompts and model responses that includes glorification of, or threats to inflict physical pain, hurt, or injury toward a person, group, or thing.

  • Image content (in preview) — Describes input prompts and model responses that includes self-harm practices, violent physical assaults, and depictions of people or animals getting hurt, often accompanied by prominent blood or bodily injuries.

Misconduct

  • Text content only — Describes input prompts and model responses that seeks or provides information about engaging in criminal activity, or harming, defrauding, or taking advantage of a person, group or institution.

Prompt Attack

  • Text content only; Only applies to prompts with input tagging— Describes user prompts intended to bypass the safety and moderation capabilities of a foundation model in order to generate harmful content (also known as jailbreak), and to ignore and to override instructions specified by the developer (referred to as prompt injection). Requires input tagging to be used in order for prompt attack to be applied. Prompt attacks detection requires input tags to be used.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.