Back to AIF-C01 Catalog
AI Services

Amazon Polly

"Text-to-speech (TTS) service that converts text into lifelike speech with neural voices and SSML control."

Amazon Polly

A text-to-speech (TTS) service that converts text into lifelike speech.

Exam Tip: Polly = text to speech. If the question involves converting text to audio/speech, the answer is Polly. SSML is for fine-grained control over speech output.


Key Capabilities

  • Voices: 60+ voices across 30+ languages
  • Neural Voices: Advanced neural TTS for more natural-sounding speech
  • Output Formats: MP3, OGG Vorbis, PCM
  • Real-time Streaming: Stream speech audio in real-time
  • Newscaster Style: Voices optimized for news reading
  • Custom Pronunciation: Use lexicons to control pronunciation of specific words

SSML (Speech Synthesis Markup Language)

SSML provides fine-grained control over how Polly generates speech:

  • Pauses:
    <break time="1s"/>
    — Insert pauses
  • Emphasis:
    <emphasis level="strong">
    — Stress specific words
  • Prosody: Control rate, pitch, and volume of speech
  • Phonetic Pronunciation: Specify exact pronunciation using IPA
  • Say-As: Control how text is interpreted (number, date, telephone, spell-out)
  • Whispering:
    <amazon:effect name="whispered">
    — Whispered speech

Common Use Cases

  • Accessibility features for visually impaired users
  • Interactive voice response (IVR) systems
  • E-learning and content narration
  • News reading and podcast generation
  • Voice-enabled applications and devices
Amazon Transcribe
Amazon Rekognition
SWIPE ZONE
< DRAG ME >