Lesson 2
Creating Voice Content with AI
Creating Voice Content with AI

Imagine crafting a podcast episode without a microphone or turning your blog post into an audiobook in minutes. AI voices are transforming how we create content—faster, cheaper, and more versatile than traditional recording. In this lesson, you’ll learn to write scripts AI can perform naturally, choose the perfect voice for your project, and seamlessly add AI-generated audio to videos or slides. Let’s turn your words into compelling voice content!

How Do AI Voices Work?

AI voice generators analyze thousands of human speech samples to learn patterns in pronunciation, rhythm, and emotion. When you input text, the AI predicts how a human would say it—but it’s only as good as your script. Garbage in, garbage out! That’s why your scriptwriting skills are the secret sauce.

Try It Now with Amazon Polly (Free Tier):

  1. Create a free AWS account.
  2. Go to the Amazon Polly Console.
  3. Select Standard under Engine.
  4. Toggle on SSML for SSML support.
  5. Copy-paste this SSML snippet into the text box:
    HTML, XML
    1<speak> 2 The <prosody rate="fast">quick</prosody> brown fox 3 <break time="1s"/> jumps over the <emphasis level="strong">lazy</emphasis> dog. 4 Pronounce "AI" as <phoneme alphabet="ipa" ph="eɪ.aɪ">AI</phoneme>. 5</speak>
  6. Select a voice (e.g., "Joanna") and click Listen.
Introducing SSML: The Language of AI Voices

SSML (Speech Synthesis Markup Language) (Amazon Polly SSML) is the universal standard for controlling AI voices. Think of it as HTML for speech – it lets you add pauses, adjust speed, and emphasize words. Here’s a cheat sheet of key tags:

SSML TagWhat It DoesExample
<break time="1s"/>Adds a pause"Hello world"
<prosody rate="slow">Slows down speech<prosody rate="slow">Important</prosody>
<phoneme alphabet="ipa" ph="tʃɪˈpoʊtleɪ">Forces pronunciation using IPA symbols"Chipotle" → <phoneme...>
Writing Scripts for AI Voice Generation

AI voices need clear, structured scripts with SSML to sound natural.

Best Practices:

  1. Use Short, Clear Sentences
    • Avoid: "We’re going to the park later which is near the river unless it rains."
    • Better: "We’re going to the park later. <break time="0.5s"/> It’s near the river—unless it rains."
  2. Add Phonetic Guides for Tricky Words
    • Use the <phoneme> tag with IPA symbols.
    • Example:
HTML, XML
1<speak> 2 Order a <phoneme alphabet="ipa" ph="tʃɪˈpoʊtleɪ">Chipotle</phoneme> 3 burrito with <phoneme alphabet="ipa" ph="ˈɡwɑːkəmoʊli">guacamole</phoneme> 4</speak>
  1. Embed SSML Directives in Your Script
    Control pacing, pauses, and emphasis as you write:
    HTML, XML
    1<speak> 2 Welcome to today’s <break time="1s"/> 3 <emphasis level="strong">must-listen</emphasis> episode. 4 <prosody rate="slow">This changes everything.</prosody> 5</speak>
Selecting Voices and Adjusting Speech Parameters

After writing your SSML script, choose a voice and fine-tune its delivery.

ParameterWhat It ControlsHow to Adjust (Amazon Polly/Google TTS)Example
ToneVoice personalitySelect from pre-built voices (e.g., "Joanna" for neutral, "Matthew" for warm)Voice ID: Joanna
PacingOverall speech speedCombine SSML <prosody rate="90%"> with voice settingsSlower: <prosody rate="slow">
Faster: <prosody rate="fast">
PitchHigh/low frequencyUse SSML: <prosody pitch="high"> or select a voice type (e.g., "Child")<prosody pitch="+10%">Exciting!</prosody>
PausesBreaks between phrasesAdd <break time="1s"/> in your scriptScript: <break time="0.75s"/>

Example Workflow:

  1. Write Script with SSML:
    HTML, XML
    1<speak> 2 <prosody rate="fast" pitch="high">New product alert!</prosody> 3 <break time="1s"/> 4 Get <emphasis level="strong">50% off</emphasis> until <say-as interpret-as="date">20251231</say-as>. 5</speak>
  2. Select a Voice in Amazon Polly:
    • Choose "Kendra" for a cheerful tone.
  3. Synthesize & Refine:
    • Notice the voice is too fast? Add <prosody rate="80%"> to the entire script.

ElevenLabs Note:

  • Use [pause 1s] instead of <break time="1s"/>.
  • Adjust speed with [slow] or [fast] instead of <prosody rate>.
  • See ElevenLabs Formatting Guide.
Crafting AI Prompts for Voice Content

Your AI voiceover is only as good as the script it’s given. Let’s combine what you learnt with prompt writing to create more dynamic voice content.

Practical Example:

1"Act as a tech podcast host. Write a 30-second intro about AI voice generation formatted in SSML. 2Include: 3- A 1-second pause after the hook 4- Emphasize 'revolutionary' 5- Phonetic spelling: generative AI = dʒɛnərətɪv eɪ aɪ 6Tone: Exciting and curious"

Output with SSML:

HTML, XML
1<speak> 2 Hello tech enthusiasts! Welcome to The Voice Frontier, your go-to podcast for breakthroughs in AI. 3 Here’s the big question: are you ready to experience a 4 <emphasis level="strong">revolutionary</emphasis> leap in digital communication? 5 <break time="1s"/> 6 In today’s episode, we’ll explore the captivating world of AI voice generation powered by 7 <sub alias="generative AI">dʒɛnərətɪv eɪ aɪ</sub>. 8 We’ll uncover how machines are crafting lifelike voices, personalizing user experiences, and reshaping industries. 9 So buckle up and tune in, because the future of voice isn’t just futuristic—it’s happening right now! 10</speak>

Prompt Writing Tips for Voice:

  • Tone Anchoring:
    "Make the voice sound like a suspenseful movie trailer narrator"
  • Pronunciation Guardrails:
    "Always spell out acronyms phonetically: NASA = <phoneme alphabet="ipa" ph="ˈnæsə">NASA"

You’ve now got the tools to turn text into speech that captivates. Whether you’re prototyping a podcast, dubbing videos, or making slides accessible, AI voices let you experiment faster. Ready to bring your scripts to life? Let’s practice!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.