Block denied topics to help remove harmful content
Guardrails can be configured with a set of denied topics that are undesirable in the context of your generative AI application. For example, a bank may want their AI assistant to avoid any conversation related to investment advice or engage in conversations related to cryptocurrencies.
You can define up to 30 denied topics. Input prompts and model completions will be evaluated against each of these denied topics. If one of the denied topics is detected, the blocked message configured as part of the guardrail will be returned to the user.
Denied topics can be defined by providing a natural language definition of the topic along with a few optional example phrases of the topic. The definition and example phrases are used to detect if an input prompt or a model completion belongs to the topic.
Denied topics are defined with the following parameters.
-
Name – The name of the topic. The name should be a noun or a phrase. Don't describe the topic in the name. For example:
-
Investment Advice
-
-
Definition – Up to 200 characters summarizing the topic content. The definition should describe the content of the topic and its subtopics.
The following is an example topic definition that you can provide:
Investment advice is inquiries, guidance, or recommendations about the management or allocation of funds or assets with the goal of generating returns or achieving specific financial objectives.
-
Sample phrases – A list of up to five sample phrases that refer to the topic. Each phrase can be up to 100 characters long. A sample is a prompt or continuation that shows what kind of content should be filtered out. For example:
-
Is investing in the stocks better than bonds?
-
Should I invest in gold?
-
Best Practices to define a topic that you want to block
Define the topic in a crisp and precise manner. A clear and unambiguous topic definition can improve the accuracy of the topic's detection. For example, a topic to detect queries or statements associated with cryptocurrencies can be defined as
Question or information associated with investing, selling, transacting, or procuring cryptocurrencies
.Do not include examples or instructions in the topic definition. For example,
Block all contents associated to cryptocurrency
is an instruction and not a definition of the topic. Such instructions must not be used as part of topic's definitions.Do not define negative topics or exceptions. For example,
All contents except medical information
orContents not containing medical information
are negative definitions of a topic and must not be used.Do not use denied topics to capture entities or words. For example,
Statement or questions containing the name of a person "X"
orStatements with a competitor name Y
. The topic definitions represent a theme or a subject and guardrails evaluates an input contextually. Topic filtering should not be used to capture individual words or entity types. Instead, consider using Remove PII from conversations by using sensitive information filters or Remove a specific list of words and phrases from conversations with word filters for such use cases.