Available neural voices Feature and region compatibility

Neural voices

Amazon Polly has a Neural text-to-speech (NTTS) engine that can produce even higher quality voices than its standard voices. Standard TTS voices use concatenative synthesis. The standard engine concatenates phonemes of recorded speech, producing very natural-sounding synthesized speech. However, the inevitable variations in speech and the techniques used to segment the waveforms limits the quality of speech. The Amazon Polly NTTS engine doesn't use standard concatenative synthesis to produce speech. It has two parts:

A neural network — that converts a sequence of phonemes (the most basic units of language) into a sequence of spectrograms. (Spectograms are snapshots of the energy levels in different frequency bands.)
A vocoder — that converts spectrograms into a nearly continuous audio signal.

The first component of the neural TTS system is a sequence-to-sequence model. This model doesn’t create its results solely from the corresponding input but also considers how the sequence of the elements of the input work together. The model chooses the spectrograms that it outputs so that their frequency bands emphasize acoustic features that the human brain uses when processing speech.

The output of this model then passes to a neural vocoder. This converts the spectrograms into speech waveforms. When trained on the large datasets used to build general-purpose concatenative-synthesis systems, this sequence-to-sequence approach will yield higher-quality, more natural-sounding voices.

Available neural voices

Neural voices are available in 36 languages and language variants. The following table lists the voices.

	Language and language variants	Language code	Name/ID	Gender
1	Arabic (Gulf)	ar-AE	Hala Zayd	Female Male
2	Belgian Dutch (Flemish)	nl-BE	Lisa	Female
3	Catalan	ca-ES	Arlet	Female
4	Czech	cs-CZ	Jitka	Female
5	Chinese (Cantonese)	yue-CN	Hiujin	Female
6	Chinese (Mandarin)	cmn-CN	Zhiyu	Female
7	Danish	da-DK	Sofie	Female
8	Dutch	nl-NL	Laura	Female
9	English (Australian)	en-AU	Olivia	Female
10	English (British)	en-GB	Amy* Emma Brian Arthur	Female Female Male Male
11	English (Indian)	en-IN	Kajal	Female
12	English (Irish)	en-IE	Niamh	Female
13	English (New Zealand)	en-NZ	Aria	Female
14	English (Singaporean)	en-SG	Jasmine	Female
15	English (South African)	en-ZA	Ayanda	Female
16	English (US)	en-US	Danielle Gregory Ivy Joanna* Kendra Kimberly Salli Joey Justin Kevin Matthew* Ruth Stephen	Female Male Female (child) Female Female Female Female Male Male (child) Male (child) Male Female Male
17	Finnish	fi-FI	Suvi	Female
18	French (Belgian)	fr-BE	Isabelle	Female
19	French (Canadian)	fr-CA	Gabrielle Liam	Female Male
20	French	fr-FR	Léa Rémi	Female Male
21	German	de-DE	Vicki Daniel	Female Male
22	German (Austrian)	de-AT	Hannah	Female
23	German (Swiss)	de-CH	Sabrina	Female
24	Hindi	hi-IN	Kajal	Female
25	Italian	it-IT	Bianca Adriano	Female Male
26	Japanese	ja-JP	Takumi Kazuha Tomoko	Male Female Female
27	Korean	ko-KR	Seoyeon Jihye	Female Female
28	Norwegian	nb-NO	Ida	Female
29	Polish	pl-PL	Ola	Female
30	Portuguese (Brazilian)	pt-BR	Camila Vitória/Vitoria Thiago	Female Female Male
31	Portuguese (European)	pt-PT	Inês/Ines	Female
32	Spanish (Spain)	es-ES	Lucia Sergio	Female Male
33	Spanish (Mexican)	es-MX	Mia Andrés	Female Male
34	Spanish (US)	es-US	Lupe* Pedro	Female Male
35	Swedish	sv-SE	Elin	Female
36	Turkish	tr-TR	Burcu	Female

*The Amy, Joanna, Lupe, and Matthew voices can be used with the Newscaster speaking style. For more information, see Applying the newscaster voice.

Note

For certain voices, Amazon Polly might automatically switch to the Standard engine when accuracy is critical. For example, this can happen during letter-by-letter spelling. You might notice an audible shift in voice quality compared to Neural synthesis.

Feature and region compatibility

Neural voices aren't available in all AWS Regions, nor do they support all Amazon Polly features.

Neural voices are supported in the following regions:

US East (N. Virginia): us-east-1
US West (Oregon): us-west-2
Africa (Cape Town): af-south-1
Asia Pacific (Tokyo): ap-northeast-1
Asia Pacific (Seoul): ap-northeast-2
Asia Pacific (Osaka): ap-northeast-3
Asia Pacific (Mumbai): ap-south-1
Asia Pacific (Singapore): ap-southeast-1
Asia Pacific (Sydney): ap-southeast-2
Asia Pacific (Malaysia): ap-southeast-5
Asia Pacific (Thailand): ap-southeast-7
Canada (Central): ca-central-1
Europe (Frankfurt): eu-central-1
Europe (Ireland): eu-west-1
Europe (London): eu-west-2
Europe (Paris): eu-west-3
Europe (Spain): eu-south-2
Europe (Zurich): eu-central-2
AWS GovCloud (US-West): us-gov-west-1

Endpoints and protocols for these Regions are identical to those used for standard voices. For more information, see Amazon Polly endpoints and quotas.

The following features are supported for neural voices:

Real-time and asynchronous speech synthesis operations.
Newscaster speaking style. For more information about the speaking styles, see Applying the newscaster voice.
All speech marks.
Many (but not all) of the SSML tags that are supported by Amazon Polly. For more information about NTTS-supported SSML tags, see Supported Tags.

As with standard voices, you can choose from various sampling rates to optimize the bandwidth and audio quality for your application. Valid sampling rates for standard and neural voices are 8 kHz, 16 kHz, 22 kHz, or 24 kHz. The default for standard voices is 22 kHz. The default for neural voices is 24 kHz. Amazon Polly supports MP3, OGG (Vorbis), and raw PCM audio stream formats.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Long-form engine

Standard engine