Amazon Polly

A text-to-speech (TTS) service that converts text into lifelike speech.

Exam Tip: Polly = text to speech. If the question involves converting text to audio/speech, the answer is Polly. SSML is for fine-grained control over speech output.

Key Capabilities

Voices: 60+ voices across 30+ languages
Neural Voices: Advanced neural TTS for more natural-sounding speech
Output Formats: MP3, OGG Vorbis, PCM
Real-time Streaming: Stream speech audio in real-time
Newscaster Style: Voices optimized for news reading
Custom Pronunciation: Use lexicons to control pronunciation of specific words

SSML (Speech Synthesis Markup Language)

SSML provides fine-grained control over how Polly generates speech:

Pauses:
<break time="1s"/>
— Insert pauses
Emphasis:
<emphasis level="strong">
— Stress specific words
Prosody: Control rate, pitch, and volume of speech
Phonetic Pronunciation: Specify exact pronunciation using IPA
Say-As: Control how text is interpreted (number, date, telephone, spell-out)
Whispering:
<amazon:effect name="whispered">
— Whispered speech

Common Use Cases

Accessibility features for visually impaired users
Interactive voice response (IVR) systems
E-learning and content narration
News reading and podcast generation
Voice-enabled applications and devices