

# Using the Amazon Nova Sonic Speech-to-Speech model
<a name="speech"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Speech-to-Speech](https://docs.aws.amazon.com/nova/latest/nova2-userguide/using-conversational-speech.html).

The Amazon Nova Sonic model provides real-time, conversational interactions through bidirectional audio streaming. Amazon Nova Sonic processes and responds to real-time speech as it occurs, enabling natural, human-like conversational experiences.

Amazon Nova Sonic delivers a transformative approach to conversational AI with its unified speech understanding and generation architecture. This state-of-the-art foundation model boasts industry-leading price performance, allowing enterprises to build voice experiences that remain natural and contextually aware. 

Key capabilities and features
+ State-of-the-art streaming speech understanding with bidirectional stream API capabilities that enable real-time, low-latency multi-turn conversations.
+ Natural, human-like conversational AI experiences are provided with contextual richness across all supported languages.
+ Adaptive speech response that dynamically adjusts delivery based on the prosody of the input speech.
+ Graceful handling of user interruptions without dropping conversational context.
+ Knowledge grounding with enterprise data using Retrieval Augmented Generation (RAG).
+ Function calling and agentic workflow support for building complex AI applications.
+ Robustness to background noise for real-world deployment scenarios.
+ Multilingual support with expressive voices and speaking styles. Expressive voices are offered, including both masculine-sounding and feminine sounding, in five languages: English (US, UK), French, Italian, German, and Spanish.
+ Recognition of varied speaking styles across all supported languages.

**Topics**
+ [

## Amazon Nova Sonic architecture
](#speech-architecture)
+ [

# Using the Bidirectional Streaming API
](speech-bidirection.md)
+ [

# Speech-to-speech Example
](s2s-example.md)
+ [

# Code examples for Amazon Nova Sonic
](speech-code-examples.md)
+ [

# Handling input events with the bidirectional API
](input-events.md)
+ [

# Handling output events with the bidirectional API
](output-events.md)
+ [

# Voices available for Amazon Nova Sonic
](available-voices.md)
+ [

# Handling errors with Amazon Nova Sonic
](speech-errors.md)
+ [

# Tool Use, RAG, and Agentic Flows with Amazon Nova Sonic
](speech-tools.md)

## Amazon Nova Sonic architecture
<a name="speech-architecture"></a>

Amazon Nova Sonic implements an event-driven architecture through the bidirectional stream API, enabling real-time conversational experiences. Here are the key architectural components of the API:

1. **Bidirectional event streaming**: Amazon Nova Sonic uses a persistent bidirectional connection that allows simultaneous event streaming in both directions. Unlike traditional request-response patterns, this approach permits the following:
   + Continuous audio streaming from the user to the model
   + Concurrent speech processing and generation
   + Real-time model responses without waiting for complete utterances

1. **Event-driven communication flow**: The entire interaction follows an event-based protocol where
   + The client and model exchange structured JSON events
   + The events control session lifecycle, audio streaming, text responses, and tool interactions
   + Each event has specific roles in the conversation flow

The bidirectional stream API consists of these three main components:

1. **Session initialization**: The client establishes a bidirectional stream and sends the configuration events.

1. **Audio streaming**: User audio is continuously captured, encoded, and streamed as events to the model, which continuously processes the speech.

1. **Response streaming**: As audio arrives, the model simultaneously sends event responses:
   + Text transcriptions of user speech (ASR)
   + Tool use events for function calling
   + Text response of the model
   + Audio chunks for spoken output

The following diagram provides a high-level overview of the bidirectional stream API.

![\[Diagram that explains the Amazon Nova Sonic bidirectional streaming system.\]](http://docs.aws.amazon.com/nova/latest/userguide/images/nova-sonic-sequential.png)


# Using the Bidirectional Streaming API
<a name="speech-bidirection"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Getting started](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-getting-started.html).

The Amazon Nova Sonic model uses the `InvokeModelWithBidirectionalStream` API, which enables real-time bidirectional streaming conversations. This differs from traditional request-response patterns by maintaining an open channel for continuous audio streaming in both directions.

The following AWS SDKs support the new bidirectional streaming API:
+ [AWS SDK for .NET](https://aws.amazon.com/sdk-for-net/)
+ [AWS SDK for C\$1\$1](https://aws.amazon.com/sdk-for-cpp/)
+ [AWS SDK for Java](https://aws.amazon.com/sdk-for-java/)
+ [AWS SDK for JavaScript](https://aws.amazon.com/sdk-for-javascript/)
+ [AWS SDK for Kotlin](https://aws.amazon.com/sdk-for-kotlin/)
+ [AWS SDK for Ruby](https://aws.amazon.com/sdk-for-ruby/)
+ [AWS SDK for Rust](https://aws.amazon.com/sdk-for-rust/)
+ [AWS SDK for Swift](https://aws.amazon.com/sdk-for-swift/)

Python developers can use this [new experimental SDK](https://github.com/awslabs/aws-sdk-python) that makes it easier to use the bidirectional streaming capabilities of Amazon Nova Sonic.

The following code examples will help you get started with the bidirectional API. For a complete list of examples, see the Amazon Nova Sonic [Github Samples](https://github.com/aws-samples/amazon-nova-samples/tree/main/speech-to-speech) page.

## Setting up the client
<a name="set-up-the-client"></a>

The following examples can be used to set up the client and begin using the bidirectional API.

------
#### [ Python ]

```
def _initialize_client(self):
    """Initialize the Bedrock client."""
    config = Config(
        endpoint_uri=f"https://bedrock-runtime.{self.region}.amazonaws.com",
        region=self.region,
        aws_credentials_identity_resolver=EnvironmentCredentialsResolver(),
        http_auth_scheme_resolver=HTTPAuthSchemeResolver(),
        http_auth_schemes={"aws.auth#sigv4": SigV4AuthScheme()}
    )
    self.bedrock_client = BedrockRuntimeClient(config=config)
```

------
#### [ Java ]

```
// The nettyBuilder is optional and mentioned here for clarity, all our APIs support http2
// and will default to the protocol if the netty builder is not specified.
NettyNioAsyncHttpClient.Builder nettyBuilder = NettyNioAsyncHttpClient.builder()
        .readTimeout(Duration.of(180, ChronoUnit.SECONDS))
        .maxConcurrency(20)
        .protocol(Protocol.HTTP2)
        .protocolNegotiation(ProtocolNegotiation.ALPN);
        

BedrockRuntimeAsyncClient client = BedrockRuntimeAsyncClient.builder()
        .region(Region.US_EAST_1)
        .credentialsProvider(ProfileCredentialsProvider.create("NOVA-PROFILE"))
        .httpClientBuilder(nettyBuilder)
        .build();
```

------
#### [ Node.js ]

```
const { BedrockRuntimeClient } = require("@aws-sdk/client-bedrock-runtime");
const { NodeHttp2Handler } = require("@smithy/node-http-handler");
const { fromIni } = require("@aws-sdk/credential-provider-ini");

// Configure HTTP/2 client for bidirectional streaming 
// (This is optional, all our APIs support http2 so we will default to http2 if handler is not specified)
const nodeHttp2Handler = new NodeHttp2Handler({
    requestTimeout: 300000,
    sessionTimeout: 300000,
    disableConcurrentStreams: false,
    maxConcurrentStreams: 20,
});

// Create a Bedrock client
const client = new BedrockRuntimeClient({
    region: "us-east-1",
    credentials: fromIni({ profile: "NOVA-PROFILE" }), // Or use other credential providers
    requestHandler: nodeHttp2Handler,
});
```

------

## Handling events
<a name="handle-events"></a>

The following examples can be used to handle events with the bidirectional API.

------
#### [ Python ]

```
self.stream_response = await self.bedrock_client.invoke_model_with_bidirectional_stream(
                InvokeModelWithBidirectionalStreamInput(model_id=self.model_id)
            )
self.is_active = True
```

```
async def _process_responses(self):
        """Process incoming responses from Bedrock."""
        try:            
            while self.is_active:
                try:
                    output = await self.stream_response.await_output()
                    result = await output[1].receive()
                    if result.value and result.value.bytes_:
                        try:
                            response_data = result.value.bytes_.decode('utf-8')
                            json_data = json.loads(response_data)
                            
                            # Handle different response types
                            if 'event' in json_data:
                                if 'contentStart' in json_data['event']:
                                    content_start = json_data['event']['contentStart']
                                    # set role
                                    self.role = content_start['role']
                                    # Check for speculative content
                                    if 'additionalModelFields' in content_start:
                                        try:
                                            additional_fields = json.loads(content_start['additionalModelFields'])
                                            if additional_fields.get('generationStage') == 'SPECULATIVE':
                                                self.display_assistant_text = True
                                            else:
                                                self.display_assistant_text = False
                                        except json.JSONDecodeError:
                                            print("Error parsing additionalModelFields")
                                elif 'textOutput' in json_data['event']:
                                    text_content = json_data['event']['textOutput']['content']
                                    role = json_data['event']['textOutput']['role']
                                    # Check if there is a barge-in
                                    if '{ "interrupted" : true }' in text_content:
                                        self.barge_in = True

                                    if (self.role == "ASSISTANT" and self.display_assistant_text):
                                        print(f"Assistant: {text_content}")
                                    elif (self.role == "USER"):
                                        print(f"User: {text_content}")

                                elif 'audioOutput' in json_data['event']:
                                    audio_content = json_data['event']['audioOutput']['content']
                                    audio_bytes = base64.b64decode(audio_content)
                                    await self.audio_output_queue.put(audio_bytes)
                                elif 'toolUse' in json_data['event']:
                                    self.toolUseContent = json_data['event']['toolUse']
                                    self.toolName = json_data['event']['toolUse']['toolName']
                                    self.toolUseId = json_data['event']['toolUse']['toolUseId']
                                elif 'contentEnd' in json_data['event'] and json_data['event'].get('contentEnd', {}).get('type') == 'TOOL':
                                    toolResult = await self.processToolUse(self.toolName, self.toolUseContent)
                                    toolContent = str(uuid.uuid4())
                                    await self.send_tool_start_event(toolContent)
                                    await self.send_tool_result_event(toolContent, toolResult)
                                    await self.send_tool_content_end_event(toolContent)
                                elif 'completionEnd' in json_data['event']:
                                    # Handle end of conversation, no more response will be generated
                                    print("End of response sequence")
                                   
                            
                            # Put the response in the output queue for other components
                            await self.output_queue.put(json_data)
                        except json.JSONDecodeError:
                            await self.output_queue.put({"raw_data": response_data})
                except StopAsyncIteration:
                    # Stream has ended
                    break
                except Exception as e:
                   # Handle ValidationException properly
                    if "ValidationException" in str(e):
                        error_message = str(e)
                        print(f"Validation error: {error_message}")
                    else:
                        print(f"Error receiving response: {e}")
                    break
                    
        except Exception as e:
            print(f"Response processing error: {e}")
        finally:
            self.is_active = False
```

------
#### [ Java ]

```
public class ResponseHandler implements InvokeModelWithBidirectionalStreamResponseHandler {
    @Override
    public void responseReceived(InvokeModelWithBidirectionalStreamResponse response) {
        // Handle initial response
        log.info("Bedrock Nova Sonic request id: {}", response.responseMetadata().requestId());
    }

    @Override
    public void onEventStream(SdkPublisher<InvokeModelWithBidirectionalStreamOutput> sdkPublisher) {
        log.info("Bedrock Nova S2S event stream received");
        var completableFuture = sdkPublisher.subscribe((output) -> output.accept(new Visitor() {
            @Override
            public void visitChunk(BidirectionalOutputPayloadPart event) {
                log.info("Bedrock S2S chunk received, converting to payload");
                String payloadString =
                        StandardCharsets.UTF_8.decode((event.bytes().asByteBuffer().rewind().duplicate())).toString();
                log.info("Bedrock S2S payload: {}", payloadString);
                    delegate.onNext(payloadString);
            }
        }));

        // if any of the chunks fail to parse or be handled ensure to send an error or they will get lost
        completableFuture.exceptionally(t -> {
            delegate.onError(new Exception(t));
            return null;
        });
    }

    @Override
    public void exceptionOccurred(Throwable throwable) {
        // Handle errors
        System.err.println("Error: " + throwable.getMessage());
        throwable.printStackTrace();
    }

    @Override
    public void complete() {
        // Handle completion
        System.out.println("Stream completed");
    }
}
```

------
#### [ Node.js ]

```
for await (const event of response.body) {
        if (!session.isActive) {
          console.log(`Session ${sessionId} is no longer active, stopping response processing`);
          break;
        }
        if (event.chunk?.bytes) {
          try {
            this.updateSessionActivity(sessionId);
            const textResponse = new TextDecoder().decode(event.chunk.bytes);

            try {
              const jsonResponse = JSON.parse(textResponse);
              if (jsonResponse.event?.contentStart) {
                this.dispatchEvent(sessionId, 'contentStart', jsonResponse.event.contentStart);
              } else if (jsonResponse.event?.textOutput) {
                this.dispatchEvent(sessionId, 'textOutput', jsonResponse.event.textOutput);
              } else if (jsonResponse.event?.audioOutput) {
                this.dispatchEvent(sessionId, 'audioOutput', jsonResponse.event.audioOutput);
              } else if (jsonResponse.event?.toolUse) {
                this.dispatchEvent(sessionId, 'toolUse', jsonResponse.event.toolUse);

                // Store tool use information for later
                session.toolUseContent = jsonResponse.event.toolUse;
                session.toolUseId = jsonResponse.event.toolUse.toolUseId;
                session.toolName = jsonResponse.event.toolUse.toolName;
              } else if (jsonResponse.event?.contentEnd &&
                jsonResponse.event?.contentEnd?.type === 'TOOL') {

                // Process tool use
                console.log(`Processing tool use for session ${sessionId}`);
                this.dispatchEvent(sessionId, 'toolEnd', {
                  toolUseContent: session.toolUseContent,
                  toolUseId: session.toolUseId,
                  toolName: session.toolName
                });

                console.log("calling tooluse");
                console.log("tool use content : ", session.toolUseContent)
                // function calling
                const toolResult = await this.processToolUse(session.toolName, session.toolUseContent);

                // Send tool result
                this.sendToolResult(sessionId, session.toolUseId, toolResult);

                // Also dispatch event about tool result
                this.dispatchEvent(sessionId, 'toolResult', {
                  toolUseId: session.toolUseId,
                  result: toolResult
                });
              } else {
                // Handle other events
                const eventKeys = Object.keys(jsonResponse.event || {});
                console.log(`Event keys for session ${sessionId}: `, eventKeys)
                console.log(`Handling other events`)
                if (eventKeys.length > 0) {
                  this.dispatchEvent(sessionId, eventKeys[0], jsonResponse.event);
                } else if (Object.keys(jsonResponse).length > 0) {
                  this.dispatchEvent(sessionId, 'unknown', jsonResponse);
                }
              }
            } catch (e) {
              console.log(`Raw text response for session ${sessionId}(parse error): `, textResponse);
            }
          } catch (e) {
            console.error(`Error processing response chunk for session ${sessionId}: `, e);
          }
        } else if (event.modelStreamErrorException) {
          console.error(`Model stream error for session ${sessionId}: `, event.modelStreamErrorException);
          this.dispatchEvent(sessionId, 'error', {
            type: 'modelStreamErrorException',
            details: event.modelStreamErrorException
          });
        } else if (event.internalServerException) {
          console.error(`Internal server error for session ${sessionId}: `, event.internalServerException);
          this.dispatchEvent(sessionId, 'error', {
            type: 'internalServerException',
            details: event.internalServerException
          });
        }
      }
```

------

## Creating a request
<a name="create-request"></a>

The following examples can be used to create a request with the bidirectional API.

------
#### [ Python ]

```
self.stream_response = await self.bedrock_client.invoke_model_with_bidirectional_stream(
                InvokeModelWithBidirectionalStreamInput(model_id="amazon.nova-sonic-v1:0")
            )
```

------
#### [ Java ]

```
InvokeModelWithBidirectionalStreamRequest request = 
   InvokeModelWithBidirectionalStreamRequest.builder()
   .modelId("amazon.nova-sonic-v1:0")
   .build();
```

------
#### [ Node.js ]

```
const request = new InvokeModelWithBidirectionalStreamCommand({
            modelId: "amazon.nova-sonic-v1:0",
            body: generateOrderedStream(), //initial request
        });
```

------

## Initiating a request
<a name="initiate-request"></a>

The following examples can be used to initiate a request with the bidirectional API.

------
#### [ Python ]

```
    START_SESSION_EVENT = '''{
        "event": {
            "sessionStart": {
            "inferenceConfiguration": {
                "maxTokens": 1024,
                "topP": 0.9,
                "temperature": 0.7
                }
            }
        }
    }'''
    
    event = InvokeModelWithBidirectionalStreamInputChunk(
            value=BidirectionalInputPayloadPart(bytes_=START_SESSION_EVENT.encode('utf-8'))
    )  
    try:
        await self.stream_response.input_stream.send(event)
    except Exception as e:
        print(f"Error sending event: {str(e)}")
```

------
#### [ Java ]

```
// Create ReplayProcessor with time-based expiry (cleans up messages after 1 minute)
ReplayProcessor<InvokeModelWithBidirectionalStreamInput> publisher = ReplayProcessor.createWithTime(
                1, TimeUnit.MINUTES, Schedulers.io()
);

// Create response handler
ResponseHandler responseHandler = new ResponseHandler();

// Initiate bidirectional stream
CompletableFuture<Void> completableFuture = client.invokeModelWithBidirectionalStream(
    request, publisher, responseHandler);

// Handle completion and errors properly
completableFuture.exceptionally(throwable -> {
    publisher.onError(throwable);
    return null;
});

completableFuture.thenApply(result -> {
    publisher.onComplete();
    return result;
});

// Send session start event
String sessionStartJson = """
{
  "event": {
    "sessionStart": {
      "inferenceConfiguration": {
        "maxTokens": 1024,
        "topP": 0.9,
        "temperature": 0.7
      }
    }
  }
}""";

publisher.onNext(
    InvokeModelWithBidirectionalStreamInput.chunkBuilder()
        .bytes(SdkBytes.fromUtf8String(sessionStartJson))
        .build()
);
```

------
#### [ Node.js ]

```
const command = new InvokeModelWithBidirectionalStreamCommand({
        modelId: "amazon.nova-sonic-v1:0",
        body: generateChunks(),
    });
async function* generateChunks() {
        // Send initialization events
        for (const event of initEvents) {
            const eventJson = JSON.stringify(event);
            console.log(`Sending event: ${eventJson.substring(0, 50)}...`);
            yield {
                chunk: {
                    bytes: textEncoder.encode(eventJson),
                },
            };
            await new Promise(resolve => setTimeout(resolve, 30));
        }
}
const initEvents = [
        {
            event: {
                sessionStart: {
                    inferenceConfiguration: {
                        maxTokens: 1024,
                        topP: 0.9,
                        temperature: 0.7
                    }
                }
            }
        },
        {
        ...
        }
];
```

------

# Speech-to-speech Example
<a name="s2s-example"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Getting started](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-getting-started.html).

This example provides a step-by-step explanation of how to implement a simple, real-time audio streaming application using the Amazon Nova Sonic model. This simplified version demonstrates the core functionality needed to create a audio conversation with the Amazon Nova Sonic model.

You can access the following example in our [Amazon Nova samples GitHub repo](https://github.com/aws-samples/amazon-nova-samples/blob/main/speech-to-speech/sample-codes/console-python/nova_sonic_simple.py).

1. 

**State the imports and configuration**

   This section imports necessary libraries and sets audio configuration parameters:
   + `asyncio`: For asynchronous programming
   + `base64`: For encoding and decoding audio data
   + `pyaudio`: For audio capture and playback
   + Amazon Bedrock SDK components for streaming
   + Audio constants define the format of audio capture (16kHz sample rate, mono channel)

   ```
   import os
   import asyncio
   import base64
   import json
   import uuid
   import pyaudio
   from aws_sdk_bedrock_runtime.client import BedrockRuntimeClient, InvokeModelWithBidirectionalStreamOperationInput
   from aws_sdk_bedrock_runtime.models import InvokeModelWithBidirectionalStreamInputChunk, BidirectionalInputPayloadPart
   from aws_sdk_bedrock_runtime.config import Config, HTTPAuthSchemeResolver, SigV4AuthScheme
   from smithy_aws_core.credentials_resolvers.environment import EnvironmentCredentialsResolver
   
   # Audio configuration
   INPUT_SAMPLE_RATE = 16000
   OUTPUT_SAMPLE_RATE = 24000
   CHANNELS = 1
   FORMAT = pyaudio.paInt16
   CHUNK_SIZE = 1024
   ```

1. 

**Define the `SimpleNovaSonic` class**

   The `SimpleNovaSonic` class is the main class that handles the Amazon Nova Sonic interaction:
   + `model_id`: The Amazon Nova Sonic model ID (`amazon.nova-sonic-v1:0`)
   + `region`: The AWS Region, the default is `us-east-1`
   + Unique IDs for prompt and content tracking
   + An asynchronous queue for audio playback

   ```
   class SimpleNovaSonic:
       def __init__(self, model_id='amazon.nova-sonic-v1:0', region='us-east-1'):
           self.model_id = model_id
           self.region = region
           self.client = None
           self.stream = None
           self.response = None
           self.is_active = False
           self.prompt_name = str(uuid.uuid4())
           self.content_name = str(uuid.uuid4())
           self.audio_content_name = str(uuid.uuid4())
           self.audio_queue = asyncio.Queue()
           self.display_assistant_text = False
   ```

1. 

**Initialize the client**

   This method configures the Amazon Bedrock client with the following:
   + The appropriate endpoint for the specified region
   + Authentication information using environment variables for AWS credentials
   + The SigV4 authentication scheme for the AWS API calls

   ```
       def _initialize_client(self):
           """Initialize the Bedrock client."""
           config = Config(
               endpoint_uri=f"https://bedrock-runtime.{self.region}.amazonaws.com",
               region=self.region,
               aws_credentials_identity_resolver=EnvironmentCredentialsResolver(),
               http_auth_scheme_resolver=HTTPAuthSchemeResolver(),
               http_auth_schemes={"aws.auth#sigv4": SigV4AuthScheme()}
           )
           self.client = BedrockRuntimeClient(config=config)
   ```

1. 

**Handle events**

   This helper method sends JSON events to the bidirectional stream, which is used for all communication with the Amazon Nova Sonic model:

   ```
       async def send_event(self, event_json):
           """Send an event to the stream."""
           event = InvokeModelWithBidirectionalStreamInputChunk(
               value=BidirectionalInputPayloadPart(bytes_=event_json.encode('utf-8'))
           )
           await self.stream.input_stream.send(event)
   ```

1. 

**Start the session**

   This method initiates the session and setups the remaining events to start audio streaming. These events need to be sent in the same order.

   ```
       async def start_session(self):
           """Start a new session with Nova Sonic."""
           if not self.client:
               self._initialize_client()
               
           # Initialize the stream
           self.stream = await self.client.invoke_model_with_bidirectional_stream(
               InvokeModelWithBidirectionalStreamOperationInput(model_id=self.model_id)
           )
           self.is_active = True
           
           # Send session start event
           session_start = '''
           {
             "event": {
               "sessionStart": {
                 "inferenceConfiguration": {
                   "maxTokens": 1024,
                   "topP": 0.9,
                   "temperature": 0.7
                 }
               }
             }
           }
           '''
           await self.send_event(session_start)
           
           # Send prompt start event
           prompt_start = f'''
           {{
             "event": {{
               "promptStart": {{
                 "promptName": "{self.prompt_name}",
                 "textOutputConfiguration": {{
                   "mediaType": "text/plain"
                 }},
                 "audioOutputConfiguration": {{
                   "mediaType": "audio/lpcm",
                   "sampleRateHertz": 24000,
                   "sampleSizeBits": 16,
                   "channelCount": 1,
                   "voiceId": "matthew",
                   "encoding": "base64",
                   "audioType": "SPEECH"
                 }}
               }}
             }}
           }}
           '''
           await self.send_event(prompt_start)
           
           # Send system prompt
           text_content_start = f'''
           {{
               "event": {{
                   "contentStart": {{
                       "promptName": "{self.prompt_name}",
                       "contentName": "{self.content_name}",
                       "type": "TEXT",
                       "interactive": true,
                       "role": "SYSTEM",
                       "textInputConfiguration": {{
                           "mediaType": "text/plain"
                       }}
                   }}
               }}
           }}
           '''
           await self.send_event(text_content_start)
           
           system_prompt = "You are a friendly assistant. The user and you will engage in a spoken dialog " \
               "exchanging the transcripts of a natural real-time conversation. Keep your responses short, " \
               "generally two or three sentences for chatty scenarios."
           
   
   
           text_input = f'''
           {{
               "event": {{
                   "textInput": {{
                       "promptName": "{self.prompt_name}",
                       "contentName": "{self.content_name}",
                       "content": "{system_prompt}"
                   }}
               }}
           }}
           '''
           await self.send_event(text_input)
           
           text_content_end = f'''
           {{
               "event": {{
                   "contentEnd": {{
                       "promptName": "{self.prompt_name}",
                       "contentName": "{self.content_name}"
                   }}
               }}
           }}
           '''
           await self.send_event(text_content_end)
           
           # Start processing responses
           self.response = asyncio.create_task(self._process_responses())
   ```

1. 

**Handle audio input**

   These methods handle the audio input lifecycle:
   + `start_audio_input`: Configures and starts the audio input stream
   + `send_audio_chunk`: Encodes and sends audio chunks to the model
   + `end_audio_input`: Properly closes the audio input stream

   ```
      async def start_audio_input(self):
           """Start audio input stream."""
           audio_content_start = f'''
           {{
               "event": {{
                   "contentStart": {{
                       "promptName": "{self.prompt_name}",
                       "contentName": "{self.audio_content_name}",
                       "type": "AUDIO",
                       "interactive": true,
                       "role": "USER",
                       "audioInputConfiguration": {{
                           "mediaType": "audio/lpcm",
                           "sampleRateHertz": 16000,
                           "sampleSizeBits": 16,
                           "channelCount": 1,
                           "audioType": "SPEECH",
                           "encoding": "base64"
                       }}
                   }}
               }}
           }}
           '''
           await self.send_event(audio_content_start)
       
       async def send_audio_chunk(self, audio_bytes):
           """Send an audio chunk to the stream."""
           if not self.is_active:
               return
               
           blob = base64.b64encode(audio_bytes)
           audio_event = f'''
           {{
               "event": {{
                   "audioInput": {{
                       "promptName": "{self.prompt_name}",
                       "contentName": "{self.audio_content_name}",
                       "content": "{blob.decode('utf-8')}"
                   }}
               }}
           }}
           '''
           await self.send_event(audio_event)
       
       async def end_audio_input(self):
           """End audio input stream."""
           audio_content_end = f'''
           {{
               "event": {{
                   "contentEnd": {{
                       "promptName": "{self.prompt_name}",
                       "contentName": "{self.audio_content_name}"
                   }}
               }}
           }}
           '''
           await self.send_event(audio_content_end)
   ```

1. 

**End the session**

   This method properly closes the session by:
   + Sending a `promptEnd` event
   + Sending a `sessionEnd` event
   + Closing the input stream

   ```
       async def end_session(self):
           """End the session."""
           if not self.is_active:
               return
               
           prompt_end = f'''
           {{
               "event": {{
                   "promptEnd": {{
                       "promptName": "{self.prompt_name}"
                   }}
               }}
           }}
           '''
           await self.send_event(prompt_end)
           
           session_end = '''
           {
               "event": {
                   "sessionEnd": {}
               }
           }
           '''
           await self.send_event(session_end)
           # close the stream
           await self.stream.input_stream.close()
   ```

1. 

**Handle responses**

   This method continuously processes responses from the model and does the following:
   + Waits for output from the stream.
   + Parses the JSON response.
   + Handles text output by printing to the console with automatic speech recognition and transcription.
   + Handles audio output by decoding and queuing for playback.

   ```
       async def _process_responses(self):
           """Process responses from the stream."""
           try:
               while self.is_active:
                   output = await self.stream.await_output()
                   result = await output[1].receive()
                   
                   if result.value and result.value.bytes_:
                       response_data = result.value.bytes_.decode('utf-8')
                       json_data = json.loads(response_data)
                       
                       if 'event' in json_data:
                           # Handle content start event
                           if 'contentStart' in json_data['event']:
                               content_start = json_data['event']['contentStart'] 
                               # set role
                               self.role = content_start['role']
                               # Check for speculative content
                               if 'additionalModelFields' in content_start:
                                   additional_fields = json.loads(content_start['additionalModelFields'])
                                   if additional_fields.get('generationStage') == 'SPECULATIVE':
                                       self.display_assistant_text = True
                                   else:
                                       self.display_assistant_text = False
                                   
                           # Handle text output event
                           elif 'textOutput' in json_data['event']:
                               text = json_data['event']['textOutput']['content']    
                              
                               if (self.role == "ASSISTANT" and self.display_assistant_text):
                                   print(f"Assistant: {text}")
                               elif self.role == "USER":
                                   print(f"User: {text}")
                           
                           # Handle audio output
                           elif 'audioOutput' in json_data['event']:
                               audio_content = json_data['event']['audioOutput']['content']
                               audio_bytes = base64.b64decode(audio_content)
                               await self.audio_queue.put(audio_bytes)
           except Exception as e:
               print(f"Error processing responses: {e}")
   ```

1. 

**Playback audio**

   This method will perform the following tasks:
   + Initialize a `PyAudio` input stream
   + Continuously retrieves audio data from the queue
   + Plays the audio through the speakers
   + Properly cleans up resources when done

   ```
      async def play_audio(self):
           """Play audio responses."""
           p = pyaudio.PyAudio()
           stream = p.open(
               format=FORMAT,
               channels=CHANNELS,
               rate=OUTPUT_SAMPLE_RATE,
               output=True
           )
           
           try:
               while self.is_active:
                   audio_data = await self.audio_queue.get()
                   stream.write(audio_data)
           except Exception as e:
               print(f"Error playing audio: {e}")
           finally:
               stream.stop_stream()
               stream.close()
               p.terminate()
   ```

1. 

**Capture audio**

   This method will perform the following tasks:
   + Initialize a `PyAudio` output stream
   + Starts the audio input session
   + Continuously captures audio chunks from the microphone
   + Sends each chunk to the Amazon Nova Sonic model
   + Properly cleans up resources when done

   ```
       async def capture_audio(self):
           """Capture audio from microphone and send to Nova Sonic."""
           p = pyaudio.PyAudio()
           stream = p.open(
               format=FORMAT,
               channels=CHANNELS,
               rate=INPUT_SAMPLE_RATE,
               input=True,
               frames_per_buffer=CHUNK_SIZE
           )
           
           print("Starting audio capture. Speak into your microphone...")
           print("Press Enter to stop...")
           
           await self.start_audio_input()
           
           try:
               while self.is_active:
                   audio_data = stream.read(CHUNK_SIZE, exception_on_overflow=False)
                   await self.send_audio_chunk(audio_data)
                   await asyncio.sleep(0.01)
           except Exception as e:
               print(f"Error capturing audio: {e}")
           finally:
               stream.stop_stream()
               stream.close()
               p.terminate()
               print("Audio capture stopped.")
               await self.end_audio_input()
   ```

1. 

**Run the main function**

   The main function orchestrates the entire process by performing the following:
   + Creates a Amazon Nova Sonic client
   + Starts the session
   + Creates concurrent tasks for audio playback and capture
   + Waits for the user to press **Enter** to stop
   + Properly ends the session and cleans up tasks

   ```
   async def main():
       # Create Nova Sonic client
       nova_client = SimpleNovaSonic()
       
       # Start session
       await nova_client.start_session()
       
       # Start audio playback task
       playback_task = asyncio.create_task(nova_client.play_audio())
       
       # Start audio capture task
       capture_task = asyncio.create_task(nova_client.capture_audio())
       
       # Wait for user to press Enter to stop
       await asyncio.get_event_loop().run_in_executor(None, input)
       
       # End session
       nova_client.is_active = False
       
       # First cancel the tasks
       tasks = []
       if not playback_task.done():
           tasks.append(playback_task)
       if not capture_task.done():
           tasks.append(capture_task)
       for task in tasks:
           task.cancel()
       if tasks:
           await asyncio.gather(*tasks, return_exceptions=True)
       
       # cancel the response task
       if nova_client.response and not nova_client.response.done():
           nova_client.response.cancel()
       
       await nova_client.end_session()
       print("Session ended")
   
   if __name__ == "__main__":
       # Set AWS credentials if not using environment variables
       # os.environ['AWS_ACCESS_KEY_ID'] = "your-access-key"
       # os.environ['AWS_SECRET_ACCESS_KEY'] = "your-secret-key"
       # os.environ['AWS_DEFAULT_REGION'] = "us-east-1"
   
       asyncio.run(main())
   ```

# Code examples for Amazon Nova Sonic
<a name="speech-code-examples"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Code examples](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-code-examples.html).

These code examples will help you quickly get started with Amazon Nova Sonic. You can access the complete list of examples in [Amazon Nova Sonic GitHub samples](https://github.com/aws-samples/amazon-nova-samples/tree/main/speech-to-speech) page.

## Getting started examples
<a name="speech-code-examples-basic"></a>

For simple examples designed to get you started using Amazon Nova Sonic, refer to the following implementations:
+ [Basic Nova Sonic implementation (Python)](https://github.com/aws-samples/amazon-nova-samples/blob/main/speech-to-speech/sample-codes/console-python/nova_sonic_simple.py): A basic implementation that demonstrates how events are structured in the bidirectional streaming API. This version does not support barge-in functionality (interrupting the assistant while it's speaking) and does not implement true bidirectional communication.
+ [Full featured Nova Sonic implementation (Python)](https://github.com/aws-samples/amazon-nova-samples/blob/main/speech-to-speech/sample-codes/console-python/nova_sonic.py): The full-featured implementation with real bidirectional communication and barge-in support. This allows for more natural conversations where users can interrupt the assistant while it's speaking, similar to human conversations.
+ [Nova Sonic with tool use (Python)](https://github.com/aws-samples/amazon-nova-samples/blob/main/speech-to-speech/sample-codes/console-python/nova_sonic_tool_use.py): An advanced implementation that extends the bidirectional communication capabilities with tool use examples. This version demonstrates how Amazon Nova Sonic can interact with external tools and APIs to provide enhanced functionality.
+ [Java WebSocket implementation (Java)](https://github.com/aws-samples/amazon-nova-samples/tree/main/speech-to-speech/sample-codes/websocket-java): This example implements a bidirectional WebSocket-based audio streaming application that integrates with Amazon Nova Sonic model for real-time speech-to-speech conversation using Java. The application enables natural conversational interactions through a web interface while leveraging Amazon's new powerful speech-to-speech model for processing and generating responses.
+ [NodeJS Websocket implementation (NodeJS)](https://github.com/aws-samples/amazon-nova-samples/tree/main/speech-to-speech/sample-codes/websocket-nodejs): This example implements a bidirectional WebSocket-based audio streaming application that integrates with Amazon Nova Sonic model for real-time speech-to-speech conversation by using NodeJS. The application enables natural conversational interactions through a web interface while leveraging Amazon's new powerful speech-to-speech model for processing and generating responses.

## Advanced use cases
<a name="speech-code-examples-advanced"></a>

For advanced examples demonstrating more complex use cases, refer to the following implementations:
+ [Amazon Bedrock Knowledge Base implementation (NodeJS)](https://github.com/aws-samples/amazon-nova-samples/tree/main/speech-to-speech/repeatable-patterns/bedrock-knowledge-base): This example demonstrates how to build an intelligent conversational application by integrating Amazon Nova Sonic model with Amazon Bedrock Knowledge Base by using NodeJS.
+ [Chat History Management (Python)](https://github.com/aws-samples/amazon-nova-samples/tree/main/speech-to-speech/repeatable-patterns/chat-history-logger): This example includes a chat history logging system that captures and preserves all interactions between the user and Nova Sonic by using Python.
+ [Hotel Reservation Cancellation (NodeJS)](https://github.com/aws-samples/amazon-nova-samples/tree/main/speech-to-speech/repeatable-patterns/customer-service/hotel-cancellation-websocket): This example demonstrates a practical customer service use case for Amazon Nova Sonic model, implementing a hotel reservation cancellation system by using NodeJS.
+ [LangChain Knowledge Base integration (Python)](https://github.com/aws-samples/amazon-nova-samples/tree/main/speech-to-speech/repeatable-patterns/langchain-knowledge-base): This implementation demonstrates how to integrate Amazon Nova Sonic's speech-to-speech capabilities with a LangChain-powered knowledge base for enhanced conversational experiences by using Python.
+ [Conversation Resumption (NodeJS)](https://github.com/aws-samples/amazon-nova-samples/tree/main/speech-to-speech/repeatable-patterns/resume-conversation): This example demonstrates how to implement conversation resumption capabilities with Amazon Nova Sonic model. Using a hotel reservation cancellation scenario as the context, the application shows how to maintain conversation state across sessions, allowing users to seamlessly continue interactions that were previously interrupted by using NodeJS.

## Hands-on workshop
<a name="speech-code-examples-workshop"></a>

We also offer a hands-on workshop that guides you through building a voice chat application using Nova Sonic with a bidirectional streaming interface. You can [access the workshop here](https://catalog.us-east-1.prod.workshops.aws/workshops/5238419f-1337-4e0f-8cd7-02239486c40d/en-US) and find the [complete code examples of the workshop here](https://github.com/aws-samples/amazon-nova-samples/tree/main/speech-to-speech/workshops).

# Handling input events with the bidirectional API
<a name="input-events"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Handling input events with the bidirectional API](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-input-events.html).

The bidirectional Stream API uses an event-driven architecture with structured input and output events. Understanding the correct event ordering is crucial for implementing successful conversational applications and maintaining the proper conversation state throughout interactions.

The Nova Sonic conversation follows a structured event sequence. You begin by sending a `sessionStart` event that contains the inference configuration parameters, such as temperature and token limits. Next, you send `promptStart` to define the audio output format and tool configurations, assigning a unique `promptName` identifier that must be included in all subsequent events.

For each interaction type (system prompt, audio, and so on), you follow a three-part pattern: use `contentStart` to define the content type and the role of the content (`SYSTEM`, `USER`, `ASSISTANT`, `TOOL`), then provide the actual content event, and finish with `contentEnd` to close that segment. The `contentStart` event specifies whether you're sending tool results, streaming audio, or a system prompt. The `contentStart` event includes a unique `contentName` identifier.

A conversation history can be included only once, after the system prompt and before audio streaming begins. It follows the same `contentStart`/`textInput`/`contentEnd` pattern. The `USER` and `ASSISTANT` roles must be defined in the `contentStart` event for each historical message. This provides essential context for the current conversation but must be completed before any new user input begins.

Audio streaming operates with continuous microphone sampling. After sending an initial `contentStart`, audio frames (approximately 32ms each) are captured directly from the microphone and immediately sent as `audioInput` events using the same `contentName`. These audio samples should be streamed in real-time as they're captured, maintaining the natural microphone sampling cadence throughout the conversation. All audio frames share a single content container until the conversation ends and it is explicitly closed.

After the conversation ends or needs to be terminated, it's essential to properly close all open streams and end the session in the correct sequence. To properly end a session and avoid resource leaks, you must follow a specific closing sequence:

1. Close any open audio streams with the `contentEnd` event.

1. Send a `promptEnd` event that references the original `promptName`.

1. Send the `sessionEnd` event.

Skipping any of these closing events can result in incomplete conversations or orphaned resources.

These identifiers create a hierarchical structure: the `promptName` ties all conversation events together, while each `contentName` marks the boundaries of specific content blocks. This hierarchy ensures that model maintains proper context throughout the interaction.

![\[Diagram that explains the Amazon Nova Sonic input event flow.\]](http://docs.aws.amazon.com/nova/latest/userguide/images/input-events.png)


## Input event flow
<a name="input-event-flow"></a>

The structure of the input event flow is provided in this section.

1. `RequestStartEvent`

   ```
   {
       "event": {
           "sessionStart": {
               "inferenceConfiguration": {
                   "maxTokens": "int",
                   "topP": "float",
                   "temperature": "float"
               }
           }
       }
   }
   ```

1. `PromptStartEvent`

   ```
   {
       "event": {
           "promptStart": {
               "promptName": "string", // unique identifier same across all events i.e. UUID
               "textOutputConfiguration": {
                   "mediaType": "text/plain"
               },
               "audioOutputConfiguration": {
                   "mediaType": "audio/lpcm",
                   "sampleRateHertz": 8000 | 16000 | 24000,
                   "sampleSizeBits": 16,
                   "channelCount": 1,
                   "voiceId": "matthew" | "tiffany" | "amy" |
                           "lupe" | "carlos" | "ambre" | "florian" |
                           "greta" | "lennart" | "beatrice" | "lorenzo",
                   "encoding": "base64",
                   "audioType": "SPEECH",
               },
               "toolUseOutputConfiguration": {
                   "mediaType": "application/json"
               },
               "toolConfiguration": {
                   "tools": [{
                       "toolSpec": {
                           "name": "string",
                           "description": "string",
                           "inputSchema": {
                               "json": "{}"
                           }
                       }
                   }]
               }
           }
       }
   }
   ```

1. `InputContentStartEvent`
   + `Text`

     ```
     {
         "event": {
             "contentStart": {
                 "promptName": "string", // same unique identifier from promptStart event
                 "contentName": "string", // unique identifier for the content block
                 "type": "TEXT",
                 "interactive": false,
                 "role": "SYSTEM" | "USER" | "ASSISTANT",
                 "textInputConfiguration": {
                     "mediaType": "text/plain"
                 }
             }
         }
     }
     ```
   + `Audio`

     ```
     {
         "event": {
             "contentStart": {
                 "promptName": "string", // same unique identifier from promptStart event
                 "contentName": "string", // unique identifier for the content block
                 "type": "AUDIO",
                 "interactive": true,
                 "role": "USER",
                 "audioInputConfiguration": {
                     "mediaType": "audio/lpcm",
                     "sampleRateHertz": 8000 | 16000 | 24000,
                     "sampleSizeBits": 16,
                     "channelCount": 1,
                     "audioType": "SPEECH",
                     "encoding": "base64"
                 }
             }
         }
     }
     ```
   + `Tool`

     ```
     {
         "event": {
             "contentStart": {
                 "promptName": "string", // same unique identifier from promptStart event
                 "contentName": "string", // unique identifier for the content block
                 "interactive": false,
                 "type": "TOOL",
                 "role": "TOOL",
                 "toolResultInputConfiguration": {
                     "toolUseId": "string", // existing tool use id
                     "type": "TEXT",
                     "textInputConfiguration": {
                         "mediaType": "text/plain"
                     }
                 }
             }
         }
     }
     ```

1. `TextInputContent`

   ```
   {
       "event": {
           "textInput": {
               "promptName": "string", // same unique identifier from promptStart event
               "contentName": "string", // unique identifier for the content block
               "content": "string"
           }
       }
   }
   ```

1. `AudioInputContent`

   ```
   {
       "event": {
           "audioInput": {
               "promptName": "string", // same unique identifier from promptStart event
               "contentName": "string", // same unique identifier from its contentStart
               "content": "base64EncodedAudioData"
           }
       }
   }
   ```

1. `ToolResultContentEvent`

   ```
   "event": {
       "toolResult": {
           "promptName": "string", // same unique identifier from promptStart event
           "contentName": "string", // same unique identifier from its contentStart
           "content": "{\"key\": \"value\"}" // stringified JSON object as a tool result 
       }
   }
   ```

1. `InputContentEndEvent`

   ```
   {
       "event": {
           "contentEnd": {
               "promptName": "string", // same unique identifier from promptStart event
               "contentName": "string" // same unique identifier from its contentStart
           }
       }
   }
   ```

1. `PromptEndEvent`

   ```
   {
       "event": {
           "promptEnd": {
               "promptName": "string" // same unique identifier from promptStart event
           }
       }
   }
   ```

1. `RequestEndEvent`

   ```
   {
       "event": {
           "sessionEnd": {}
       }
   }
   ```

# Handling output events with the bidirectional API
<a name="output-events"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Handling output events with the bidirectional API](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-output-events.html).

When the Amazon Nova Sonic model responds, it follows a structured event sequence. The flow begins with a `completionStart` event that contains unique identifiers like `sessionId`, `promptName`, and `completionId`. These identifiers are consistent throughout the response cycle and unite all subsequent response events.

Each response type follows a consistent three-part pattern: `contentStart` defines the content type and format, the actual content event, and `contentEnd` closes that segment. The response typically includes multiple content blocks in sequence: automatic speech recognition (ASR) transcription (what the user said), optional tool use (when external information is needed), text response (what the model plans to say), and audio response (the spoken output).

The ASR transcription appears first, delivering the model's understanding of the user's speech with `role: "USER"` and `"additionalModelFields": "{\"generationStage\":\"FINAL\"}"` in the `contentStart`. When the model needs external data, it sends tool-related events with specific tool names and parameters. The text response provides a preview of the planned speech with `role: "ASSISTANT"` and `"additionalModelFields": "{\"generationStage\":\"SPECULATIVE\"}"`. The audio response then delivers base64-encoded speech chunks sharing the same `contentId` throughout the stream.

During audio generation, Amazon Nova Sonic supports natural conversation flow through its barge-in capability. When a user interrupts Amazon Nova Sonic while it's speaking, Nova Sonic immediately stops generating speech, switches to listening mode, and sends a content notification indicating the interruption has occurred. Because Nova Sonic operates faster than real-time, some audio may have already been delivered but not yet played. The interruption notification enables the client application to clear its audio queue and stop playback immediately, creating a responsive conversational experience.

After audio generation completes (or is interrupted via barge-in), Amazon Nova Sonic provides an additional text response that contains a sentence-level transcription of what was actually spoken. This text response includes a `contentStart` event with `role: "ASSISTANT"` and `"additionalModelFields": "{\"generationStage\":\"FINAL\"}"`.

Throughout the response handling, `usageEvent` events are sent to track token consumption. These events contain detailed metrics on input tokens and output tokens (both speech and text), and their cumulative totals. Each `usageEvent` maintains the same `sessionId`, `promptName`, and `completionId` as other events in the conversation flow. The details section provides both incremental changes (delta) and running totals of token usage, enabling precise monitoring of the usage during the conversation.

The model sends a `completionEnd` event with the original identifiers and a `stopReason` that indicates how the conversation ended. This event hierarchy ensures your application can track which parts of the response belong together and process them accordingly, maintaining conversation context throughout multiple turns.

The output event flow begins by entering the response generation phase. It starts with automatic speech recognition, selects a tool for use, transcribes speech, generates audio, finalizes the transcription, and finishes the session.

![\[Diagram that explains the Amazon Nova Sonic output event flow.\]](http://docs.aws.amazon.com/nova/latest/userguide/images/output-events.png)


## Output event flow
<a name="output-event-flow"></a>

The structure of the output event flow is described in this section.

1. `UsageEvent`

   ```
   "event": {
       "usageEvent": {
           "completionId": "string", // unique identifier for completion
           "details": {
               "delta": { // incremental changes since last event
                   "input": {
                       "speechTokens": number, // input speech tokens
                       "textTokens": number // input text tokens
                   },
                   "output": {
                       "speechTokens": number, // speech tokens generated
                       "textTokens": number // text tokens generated
                   }
               },
               "total": { // cumulative counts
                   "input": {
                       "speechTokens": number, // total speech tokens processed
                       "textTokens": number // total text tokens processed
                   },
                   "output": {
                       "speechTokens": number, // total speech tokens generated
                       "textTokens": number // total text tokens generated
                   }
               }
           },
           "promptName": "string", // same unique identifier from promptStart event
           "sessionId": "string", // unique identifier
           "totalInputTokens": number, // cumulative input tokens
           "totalOutputTokens": number, // cumulative output tokens
           "totalTokens": number // total tokens in the session
       }
   }
   ```

1. `CompleteStartEvent`

   ```
   "event": {
           "completionStart": {
               "sessionId": "string", // unique identifier
               "promptName": "string", // same unique identifier from promptStart event
               "completionId": "string", // unique identifier
           }
       }
   ```

1. `TextOutputContent`
   + `ContentStart`

     ```
     "event": {
             "contentStart": {
                 "additionalModelFields": "{\"generationStage\":\"FINAL\"}" | "{\"generationStage\":\"SPECULATIVE\"}",
                 "sessionId": "string", // unique identifier
                 "promptName": "string", // same unique identifier from promptStart event
                 "completionId": "string", // unique identifier
                 "contentId": "string", // unique identifier for the content block
                 "type": "TEXT",
                 "role": "USER" | "ASSISTANT",
                 "textOutputConfiguration": {
                     "mediaType": "text/plain"
                 }
             }
         }
     ```
   + `TextOutput`

     ```
     "event": {
             "textOutput": {
                 "sessionId": "string", // unique identifier
                 "promptName": "string", // same unique identifier from promptStart event
                 "completionId": "string", // unique identifier
                 "contentId": "string", // same unique identifier from its contentStart
                 "content": "string" // User transcribe or Text Response
             }
         }
     ```
   + `ContentEnd`

     ```
     "event": {
         "contentEnd": {
                 "sessionId": "string", // unique identifier
                 "promptName": "string", // same unique identifier from promptStart event
                 "completionId": "string", // unique identifier
                 "contentId": "string", // same unique identifier from its contentStart
                 "stopReason": "PARTIAL_TURN" | "END_TURN" | "INTERRUPTED",
                 "type": "TEXT"
         }
       }
     ```

1. `ToolUse`

   1. `ContentStart`

      ```
      "event": {
          "contentStart": {
            "sessionId": "string", // unique identifier
            "promptName": "string", // same unique identifier from promptStart event
            "completionId": "string", // unique identifier
            "contentId": "string", // unique identifier for the content block
            "type": "TOOL",
            "role": "TOOL",
            "toolUseOutputConfiguration": {
              "mediaType": "application/json"
            }
          }
        }
      ```

   1. `ToolUse`

      ```
      "event": {
          "toolUse": {
            "sessionId": "string", // unique identifier
            "promptName": "string", // same unique identifier from promptStart event
            "completionId": "string", // unique identifier
            "contentId": "string", // same unique identifier from its contentStart
            "content": "json",
            "toolName": "string",
            "toolUseId": "string"
          }
        }
      ```

   1. `ContentEnd`

      ```
      "event": {
          "contentEnd": {
            "sessionId": "string", // unique identifier
            "promptName": "string", // same unique identifier from promptStart event
            "completionId": "string", // unique identifier
            "contentId": "string", // same unique identifier from its contentStart
            "stopReason": "TOOL_USE",
            "type": "TOOL"
          }
        }
      ```

1. `AudioOutputContent`

   1. `ContentStart`

      ```
      "event": {
          "contentStart": {
            "sessionId": "string", // unique identifier
            "promptName": "string", // same unique identifier from promptStart event
            "completionId": "string", // unique identifier
            "contentId": "string", // unique identifier for the content block
            "type": "AUDIO",
            "role": "ASSISTANT",
            "audioOutputConfiguration": {
                  "mediaType": "audio/lpcm",
                  "sampleRateHertz": 8000 | 16000 | 24000,
                  "sampleSizeBits": 16,
                  "encoding": "base64",
                  "channelCount": 1
                  }
            }
        }
      ```

   1. `AudioOutput`

      ```
      "event": {
              "audioOutput": {
                  "sessionId": "string", // unique identifier
                  "promptName": "string", // same unique identifier from promptStart event
                  "completionId": "string", // unique identifier
                  "contentId": "string", // same unique identifier from its contentStart
                  "content": "base64EncodedAudioData", // Audio
              }
          }
      ```

   1. `ContentEnd`

      ```
      "event": {
          "contentEnd": {
            "sessionId": "string", // unique identifier
            "promptName": "string", // same unique identifier from promptStart event
            "completionId": "string", // unique identifier
            "contentId": "string", // same unique identifier from its contentStart
            "stopReason": "PARTIAL_TURN" | "END_TURN",
            "type": "AUDIO"
          }
        }
      ```

1. `CompletionEndEvent`

   ```
   "event": {
       "completionEnd": {
         "sessionId": "string", // unique identifier
         "promptName": "string", // same unique identifier from promptStart event
         "completionId": "string", // unique identifier
         "stopReason": "END_TURN" 
       }
     }
   ```

# Voices available for Amazon Nova Sonic
<a name="available-voices"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Language support and multilingual capabilities](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-language-support.html).

The available voices and locales are as follows:


| Language | Feminine-sounding voice ID | Masculine-sounding voice ID | 
| --- |--- |--- |
| English (US) | tiffany | matthew | 
| English (GB) | amy |  | 
| French | ambre | florian | 
| Italian | beatrice | lorenzo | 
| German | greta | lennart | 
| Spanish | lupe | carlos | 

# Handling errors with Amazon Nova Sonic
<a name="speech-errors"></a>

When errors occur, we recommend trying the following steps:

1. Send the `promptEnd` event.

1. Send the `sessionEnd` event.

1. If the audio streaming has started, also send the `contentEnd` event.

Completing these steps also frees GPU resources and memory.

When handling long conversations or recovering from errors, you can implement conversation resumption using the following approach:

1. Set up chat history storage to preserve conversation context from previous interactions. You can find chat history example in our [Amazon Nova samples Github repo](https://github.com/aws-samples/amazon-nova-samples/tree/main/speech-to-speech/repeatable-patterns/chat-history-logger).

1. Handle conversation timeouts proactively:
   + When approaching the maximum connection duration, end the current request and start a new one.
   + Include the saved chat history in the new request to maintain conversation continuity.

1. Format resumed conversations properly:
   + Place the chat history after the system prompt but before any new user input.
   + Include previous messages with the proper user and assistant roles.
   + Ensure that the first message in the chat history is from the user.

   You can find chat resumption example in our [Amazon Nova samples Github repo](https://github.com/aws-samples/amazon-nova-samples/tree/main/speech-to-speech/repeatable-patterns/resume-conversation).

**When to use conversation resumption**  
The conversation resumption approach is particularly helpful for error recovery in the following scenarios:
+ After you receive a `ModelTimeoutException` with the message "Model has timed out in processing the request".
+ When you need to restore context after an unexpected disconnection.

# Tool Use, RAG, and Agentic Flows with Amazon Nova Sonic
<a name="speech-tools"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Tool configuration](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-tool-configuration.html).

The Amazon Nova Sonic model extends its capabilities beyond pre-trained knowledge by supporting tool use. Tool use, sometimes called function calling, enables integration with external functions, APIs, and data sources. This section explains how to implement tool use, Retrieval-Augmented Generation (RAG), and agentic workflows with Amazon Nova Sonic.

![\[Diagram that explains how Amazon Nova Sonic calls a tool and uses it to generate results.\]](http://docs.aws.amazon.com/nova/latest/userguide/images/novaSonicDiagram.png)


You can control what tool the model uses by specifying the `toolChoice` parameter. For more information, see [Choosing a tool](https://docs.aws.amazon.com/nova/latest/userguide/tool-choice.html).

**Topics**
+ [

# Using tools
](speech-tools-use.md)
+ [

# Controlling how tools are chosen
](speech-tools-choice.md)
+ [

# Tool choice best practices
](speech-tools-bp.md)
+ [

# Implementing RAG
](speech-rag.md)
+ [

# Building agentic flows
](speech-agentic.md)

# Using tools
<a name="speech-tools-use"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Tool configuration](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-tool-configuration.html).

In order to use a tool, it must be defined as part of the `promptStart` event in your session configuration. This is demonstrated in the following code:

```
{
  "event": {
    "promptStart": {
      "promptName": "string",
      "textOutputConfiguration": {
        "mediaType": "text/plain"
      },
      "audioOutputConfiguration": {
        "mediaType": "audio/lpcm",
        "sampleRateHertz": 8000 | 16000 | 24000,
        "sampleSizeBits": 16,
        "channelCount": 1,
        "voiceId": "matthew" | "tiffany" | "amy",
        "encoding": "base64",
        "audioType": "SPEECH"
      },
      "toolUseOutputConfiguration": {
        "mediaType": "application/json"
      },
      "toolConfiguration": {
        "tools": [
          {
            "toolSpec": {
              "name": "string",
              "description": "string",
              "inputSchema": {
                "json": "{}"
              }
            }
          }
        ]
      }
    }
  }
}
```

## Tool definition components
<a name="speech-tools-definition"></a>

Each tool specification requires the following elements:
+ **Name** - A unique identifier for the tool.
+ **Description** - A explanation of what the tool does and when it should be used.
+ **Input schema** - The JSON schema that defines the required parameters.

## Basic tool example
<a name="speech-tools-example"></a>

Here's an example of a simple tool that retrieves information about the current date. For more information on how to define a tool, see [Defining a tool](https://docs.aws.amazon.com/nova/latest/userguide/tool-use-definition.html).

```
// A simple tool with no required parameters
const dateTool = {
  toolSpec: {
    name: "getDateTool",
    description: "Get information about the current date",
    inputSchema: {
      json: JSON.stringify({
        type: "object",
        properties: {},
        required: []
      })
    }
  }
};
```

And here is what the `promptStart` event would look like:

```
{
  event: {
    promptStart: {
      promptName: "string",
      textOutputConfiguration: {
        mediaType: "text/plain"
      },
      audioOutputConfiguration: {
        mediaType: "audio/lpcm",
        sampleRateHertz: 24000,
        sampleSizeBits: 16,
        channelCount: 1,
        voiceId: "tiffany",
        encoding: "base64",
        audioType: "SPEECH"
      },
      toolUseOutputConfiguration: {
        mediaType: "application/json"
      },
      toolConfiguration: {
        tools: [
          {
            toolSpec: {
              name: "getDateTool",
              description: "get information about the current date",
              inputSchema: {
                json: JSON.stringify({
                  type: "object",
                  properties: {},
                  required: []
                })
              }
            }
          }
        ]
      }
    }
  }
}
```

# Controlling how tools are chosen
<a name="speech-tools-choice"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Tool configuration](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-tool-configuration.html).

Amazon Nova Sonic supports three tool choice parameters to help you manage tool execution. You can control which tool the model uses by specifying the `toolChoice` parameter.
+ **Tool** - The `tool` option ensures that the specific named tool is called exactly once at the beginning of the response generation. For example, if you specify a knowledge base tool, the model will query this knowledge base before responding, regardless of whether it thinks the tool is needed.
+ **Any** - The `any` option ensures at least one of the available tools is called at the beginning of the response generation, while allowing the model to select the most appropriate one. This is useful when you have multiple knowledge bases or tools and want to ensure the model leverages at least one of them without specifying which one.
+ **Auto** - With `auto`, the model has complete flexibility to determine whether any tools are needed at the beginning of the response generation and can call multiple tools if required. This is also the default behavior.

For more information, see [Tool use with Amazon Nova](https://docs.aws.amazon.com/nova/latest/userguide/tool-choice.html).

**Multi-tool sequence behavior**  
Amazon Nova Sonic handles tool execution intelligently within each response cycle. When you use the `tool` option, the model will first execute the specified tool, then evaluate whether additional tools are needed before generating its final response. Similarly, with the `any` option, the model first selects and calls one tool from the available options, then decides if additional tool calls would be needed before proceeding to generate its answer.

In all cases, the model manages the entire tool execution sequence within a single response generation cycle, determining when sufficient information has been gathered to generate an appropriate response.

Consider the following example scenarios:

------
#### [ Knowledge base example ]
+ With `toolChoice: "knowledge_tool"`, the model will always query the specified knowledge base first, then possibly use other tools before responding if needed.
+ With `toolChoice: "any"` and multiple knowledge bases available, the model will select the most relevant knowledge base, query it, and then possibly consult additional sources if needed.
+ With `toolChoice: "auto"`, the model may skip knowledge lookups entirely for questions it can answer directly, or query multiple knowledge bases for complex questions.

------
#### [ Multi-functional assistant example ]
+ A virtual assistant with weather, calendar, and knowledge tools could use `toolChoice: "auto"` to flexibly respond to diverse queries, calling only the necessary tools.
+ Using `toolChoice: "any"` would ensure at least one tool is always used, even for queries the model could potentially answer directly.

------

To learn more, please refer to [Tool Choice](https://docs.aws.amazon.com/nova/latest/userguide/tool-choice.html).

# Tool choice best practices
<a name="speech-tools-bp"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Tool configuration](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-tool-configuration.html).

When implementing tools with Amazon Nova Sonic, we recommend following these best practices to ensure optimal performance:
+ **Keep schema structure simple**: Limit top-level keys to 3 or fewer when possible.
+ **Create distinct parameter names**: Use clear, semantically different names between similar parameters to avoid confusion (that is, don't use both "product\$1id" and "cart\$1item\$1id" if they serve different purposes).
+ **Provide detailed tool descriptions**: Clearly describe each tool's purpose and when it should be used to help the model select the appropriate tool.
+ **Define input schemas precisely**: Specify parameter types and include descriptions for each parameter. Clearly indicate which parameters are required versus optional.
+ **Monitor context length**: Tool performance may degrade as context approaches larger tokens (that is, approximately 50K tokens). Consider breaking complex tasks into smaller steps when working with long contexts.
+ **Implement error handling**: Prepare for cases when tool execution fails by including appropriate fallback behaviors.
+ **Test thoroughly**: Verify your tools work across a variety of inputs and edge cases before deployment.
+ **Greedy decoding parameters**: Set the value of temperature to 0 for tool use.

We recommend that you avoid the following common issues:
+ When you encounter JSON schema adherence failures, you might need to simplify your schema structure or provide clearer instructions.
+ Be mindful that the model might omit optional parameters that would improve results (such as 'limit' parameters in queries).

By following these guidelines, you can leverage the full capabilities of the Amazon Nova Sonic model's tool use features to create powerful conversational AI applications that can access external data sources and perform complex actions.

# Implementing RAG
<a name="speech-rag"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Tool configuration](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-tool-configuration.html).

Retrieval-Augmented Generation (RAG) enhances responses by retrieving and incorporating information from your knowledge bases. With Amazon Nova Sonic, RAG is implemented through tool use. 

## Knowledge base implementation outline
<a name="speech-rag-implement"></a>

Implementing a RAG requires the following elements:
+ **Configure the tool** - Define a knowledge base search tool in your `promptStart` event.
+ **Receive Tool Use Request** - When the user asks a question, the model will call the knowledge base tool.
+ **Query Vector Database** - Execute the search query against your vector database.
+ **Return Results** - Send the search results back to the model.
+ **Generate Response** - The model incorporates the retrieved information in its spoken response.

## Knowledge base configuration
<a name="speech-rag-tool"></a>

Here is an example configuration of a basic knowledge base tool:

```
{
     toolSpec: {
         name: "knowledgeBase",
         description: "Search the company knowledge base for information",
         inputSchema: {
             json: JSON.stringify({
                 type: "object",
                 properties: {
                     query: {
                         type: "string",
                         description: "The search query to find relevant information"
                     }
                 },
                 required: ["query"]
             })
         }
     }
 };
```

# Building agentic flows
<a name="speech-agentic"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 Sonic guide, visit [Tool configuration](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-tool-configuration.html).

For more complex use cases, you can implement agentic flows by configuring multiple tools that work together to accomplish tasks. Amazon Nova Sonic can orchestrate these tools based on user requests.

## Knowledge base implementation outline
<a name="speech-agentic-example"></a>

**Hotel Reservation Cancellation Agent Example**  
Here is an example configuration of a hotel reservation cancellation system:

```
toolConfiguration: {
    tools: [
      {
        toolSpec: {
          name: "getReservation",
          description: "Retrieves hotel reservation information based on the guest's name and check-in date",
          inputSchema: {
            json: JSON.stringify({
              type: "object",
              properties: {
                name: {
                  type: "string",
                  description: "Full name of the guest who made the reservation"
                },
                checkInDate: {
                  type: "string",
                  description: "The check-in date for the reservation in YYYY-MM-DD format"
                }
              },
              required: ["name", "checkInDate"]
            })
          }
        }
      },
      {
        toolSpec: {
          name: "cancelReservation",
          description: "Cancels a hotel reservation after confirming the cancellation policy with the guest",
          inputSchema: {
            json: JSON.stringify({
              type: "object",
              properties: {
                reservationId: {
                  type: "string",
                  description: "The unique identifier for the reservation to be cancelled"
                },
                confirmCancellation: {
                  type: "boolean",
                  description: "Confirmation from the guest that they understand the cancellation policy and want to proceed",
                  default: false
                }
              },
              required: ["reservationId", "confirmCancellation"]
            })
          }
        }
      }
    ]
  }
```

**Hotel Search Agent Example**  
And here is an example configuration of a hotel search agent:

```
toolSpec: {
    name: "searchHotels",
    description: "Search for hotels by location, star rating, amenities and price range.",
    inputSchema: {
        json: JSON.stringify({
            type: "object",
            properties: {
                location: {
                    type: "string",
                    description: "City or area to search for hotels"
                },
                rating: {
                    type: "number",
                    minimum: 1,
                    maximum: 5,
                    description: "Minimum star rating (1-5)"
                },
                amenities: {
                    type: "array",
                    items: {
                        type: "string"
                    },
                    description: "List of desired amenities"
                },
                price_range: {
                    type: "object",
                    properties: {
                        min: {
                            type: "number",
                            minimum: 0
                        },
                        max: {
                            type: "number",
                            minimum: 0
                        }
                    },
                    description: "Price range per night"
                }
            },
            required: []
        })
    }
}
```