Amazon Polly
A text-to-speech (TTS) service that converts text into lifelike speech.
Exam Tip: Polly = text to speech. If the question involves converting text to audio/speech, the answer is Polly. SSML is for fine-grained control over speech output.
Key Capabilities
- Voices: 60+ voices across 30+ languages
- Neural Voices: Advanced neural TTS for more natural-sounding speech
- Output Formats: MP3, OGG Vorbis, PCM
- Real-time Streaming: Stream speech audio in real-time
- Newscaster Style: Voices optimized for news reading
- Custom Pronunciation: Use lexicons to control pronunciation of specific words
SSML (Speech Synthesis Markup Language)
SSML provides fine-grained control over how Polly generates speech:
- Pauses:— Insert pauses
<break time="1s"/> - Emphasis:— Stress specific words
<emphasis level="strong"> - Prosody: Control rate, pitch, and volume of speech
- Phonetic Pronunciation: Specify exact pronunciation using IPA
- Say-As: Control how text is interpreted (number, date, telephone, spell-out)
- Whispering:— Whispered speech
<amazon:effect name="whispered">
Common Use Cases
- Accessibility features for visually impaired users
- Interactive voice response (IVR) systems
- E-learning and content narration
- News reading and podcast generation
- Voice-enabled applications and devices