Include a guardrail with Converse API
You can use a guardrail to guard conversational apps that you create with the Converse API. For example, if you create a chat app with Converse API, you can use a guardrail to block inappropriate content entered by the user and inappropriate content generated by the model. For information about the Converse API, see Carry out a conversation with the Converse API operations.
Topics
Calling the Converse API with guardrails
To use a guardrail, you include configuration information for the guardrail in calls to the Converse or ConverseStream (for streaming responses) operations. Optionally, you can select specific content in the message that you want the guardrail to assess. For information about the models that you can use with guardrails and the Converse API, see Supported models and model features.
Topics
Configuring the guardrail to work with Converse API
You specify configuration information for the guardrail in the
guardrailConfig
input parameter. The configuration includes the ID
and the version of the guardrail that you want to use. You can also enable tracing
for the guardrail, which provides information about the content that the guardrail
blocked.
With the Converse
operation, guardrailConfig
is a
GuardrailConfiguration object, as shown in the following
example.
{ "guardrailIdentifier": "
Guardrail ID
", "guardrailVersion": "Guardrail version
", "trace": "enabled" }
If you use ConverseStream
, you pass a GuardrailStreamConfiguration object. Optionally, you can use the
streamProcessingMode
field to specify that you want the model to
complete the guardrail assessment, before returning streaming response chunks. Or,
you can have the model asynchronously respond whilst the guardrail continues its
assessment in the background. For more information, see Configure streaming response behavior to filter content.
Guarding a message to assess harmful content using APIs
When you pass a message (Message) to a
model, the guardrail assesses the content in the message. Optionally, you can guard
selected content in the message by specifying the guardContent
(GuardrailConverseContentBlock) field. The guardrail evaluates only
the content in the guardContent
field and not the rest of the message.
This is useful for having the guardrail assess only the most message in a conversation,
as shown in the following example.
[ { "role": "user", "content": [ { "text": "Create a playlist of 2 pop songs." } ] }, { "role": "assistant", "content": [ { "text": " Sure! Here are two pop songs:\n1. \"Bad Habits\" by Ed Sheeran\n2. \"All Of The Lights\" by Kanye West\n\nWould you like to add any more songs to this playlist? " } ] }, { "role": "user", "content": [ { "guardContent": { "text": { "text": "Create a playlist of 2 heavy metal songs." } } } ] } ]
Another use is providing additional context for a message, without having the guardrail assess that additional context.
[ { "role": "user", "content": [ { "text": "Only answer with a list of songs." }, { "guardContent": { "text": { "text": "Create a playlist of heavy metal songs." } } } ] } ]
Note
Using the guardContent
field is analogous to
using input tags with InvokeModel and InvokeModelWithResponseStream. For more information, see
Apply tags to user input to filter content.
Guarding a system prompt sent to the Converse API
You can use guardrails with system prompts that you send to the Converse API. To guard a system prompt, specify the
guardContent
(SystemContentBlock) field in the system prompt that you pass to the API, as shown
in the following example.
[ { "guardContent": { "text": { "text": "Only respond with Welsh heavy metal songs." } } } ]
If you don't provide the guardContent
field, the guardrail doesn't assess the system prompt message.
Message and system prompt guardrail behavior
How the guardrail assesses guardContent
field behaves differently between system prompts and messages that you pass in the message.
System prompt has Guardrail block | System prompt does not have Guardrail block | |
---|---|---|
Messages have Guardrail block |
System: Guardrail investigates content in Guardrail block Messages: Guardrail investigates content in Guardrail block |
System: Guardrail investigates nothing Messages: Guardrail investigates content in Guardrail block |
Messages does not have Guardrail block |
System: Guardrail investigates content in Guardrail block Messages: Guardrail investigates everything |
System: Guardrail investigates nothing Messages: Guardrail investigates everything |
Processing the response when using the Converse API
When you call the Converse operation, the guardrail assesses the message that you send. If the guardrail detects blocked content, the following happens.
The
stopReason
field in the response is set toguardrail_intervened
.-
If you enabled tracing, the trace is available in the
trace
(ConverseTrace) Field. WithConverseStream
, the trace is in the metadata (ConverseStreamMetadataEvent) that operation returns. -
The blocked content text that you have configured in the guardrail is returned in the
output
(ConverseOutput) field. WithConverseStream
the blocked content text is in the streamed message.
The following partial response shows the blocked content text and the trace from the guardrail assessment. The guardrail has blocked the term Heavy metal in the message.
{ "output": { "message": { "role": "assistant", "content": [ { "text": "Sorry, I can't answer questions about heavy metal music." } ] } }, "stopReason": "guardrail_intervened", "usage": { "inputTokens": 0, "outputTokens": 0, "totalTokens": 0 }, "metrics": { "latencyMs": 721 }, "trace": { "guardrail": { "inputAssessment": { "3o06191495ze": { "topicPolicy": { "topics": [ { "name": "Heavy metal", "type": "DENY", "action": "BLOCKED" } ] }, "invocationMetrics": { "guardrailProcessingLatency": 240, "usage": { "topicPolicyUnits": 1, "contentPolicyUnits": 0, "wordPolicyUnits": 0, "sensitiveInformationPolicyUnits": 0, "sensitiveInformationPolicyFreeUnits": 0, "contextualGroundingPolicyUnits": 0 }, "guardrailCoverage": { "textCharacters": { "guarded": 39, "total": 72 } } } } } } } }
Example code for using Converse API with guardrails
This example shows how to guard a conversation with the Converse
and
ConverseStream
operations. The example shows
how to prevent a model from creating a playlist that includes songs from the heavy metal genre.
To guard a conversation
-
Create a guardrail by following the instructions at Create a guardrail. In step 6a, enter the following information to create a denied topic:
-
Name – Enter Heavy metal.
-
Definition for topic – Enter Avoid mentioning songs that are from the heavy metal genre of music.
-
Add sample phrases – Enter Create a playlist of heavy metal songs.
In step 9, enter the following:
-
Messaging shown for blocked prompts – Enter Sorry, I can't answer questions about heavy metal music.
-
Messaging for blocked responses – Enter Sorry, the model generated an answer that mentioned heavy metal music.
You can configure other guardrail options, but it is not required for this example.
-
-
Create a version of the guardrail by following the instructions at Create a version of a guardrail.
-
In the following code examples (Converse and ConverseStream), set the following variables:
guardrail_id
– The ID of the guardrail that you created in step 1.guardrail_version
– The version of the guardrail that you created in step 2.text
– UseCreate a playlist of heavy metal songs.
-
Run the code examples. The output should should display the guardrail assessment and the output message
Text: Sorry, I can't answer questions about heavy metal music.
. The guardrail input assessment shows that the model detected the term heavy metal in the input message. -
(Optional) Test that the guardrail blocks inappropriate text that the model generates by changing the value of
text
to List all genres of rock music.. Run the examples again. You should see an output assessment in the response.