Generating Audio with gTTS

Introduction: Why Add Audio to Our Cooking API?

Welcome to your first step in building an AI Cooking Helper with Flask! In this lesson, you will learn how to add audio features to your API using Text-to-Speech (TTS) technology. Imagine you are cooking and your hands are messy — being able to listen to recipe steps instead of reading them can make the process much easier and more enjoyable. By the end of this lesson, you will know how to generate audio from text and serve it through your API, making your cooking assistant more helpful and accessible.

Recall: How We Add Endpoints in Flask

Before we dive into audio, let’s quickly remind ourselves how we add new endpoints in a Flask application. In previous lessons (or if you are new, just know this is the basic pattern), we use the @app.route decorator to define a new route. For example:

Here, when a user visits /hello, the function hello() runs and returns a simple message. In our project, we use Blueprints and organize routes in a separate file, but the idea is the same: each route connects a URL to a function that returns a response.

Getting to Know gTTS (Google Text-to-Speech)

Now, let’s talk about gTTS. gTTS stands for Google Text-to-Speech. It is a Python library that takes text and turns it into spoken audio using Google’s TTS engine. Important: gTTS sends your text to Google’s servers to generate the audio, so it requires an active internet connection. If your server or development environment is offline, gTTS will not work.

This is useful for making your app more accessible, especially for users who prefer listening over reading.

For example, if you want to turn the text “Welcome to your smart cooking assistant!” into audio, gTTS can do that in just a few lines of code.

You do not need to install gTTS on CodeSignal, as it is already available. However, if you want to use it on your own computer, you would install it with:

Writing a Function to Generate Audio from Text

Let’s build the function that will turn text into audio. We will use the gTTS class from the gtts library and the BytesIO class from Python’s io module. Here’s how we do it, step by step:

Step 1: Import the Required Libraries

First, we need to import the classes we will use:

gTTS is the main class for converting text to speech.
BytesIO lets us work with audio data in memory, without saving it to a file.

Step 2: Create the Function

Now, let’s write the function that takes some text and returns the audio data:

Let’s break this down:

tts = gTTS(text): This creates a gTTS object with the text you want to convert.
audio_io = BytesIO(): This creates an in-memory file to hold the audio data.
tts.write_to_fp(audio_io): This writes the audio data into the audio_io object.
audio_io.seek(0): This moves the pointer to the start of the audio data, so it can be read from the beginning.
return audio_io: This returns the in-memory audio file, ready to be sent as a response.

This function does not save any files to disk. It keeps everything in memory, which is fast and efficient for web APIs.

Building the /api/tts Endpoint

Now that we have a function to generate audio, let’s create an API endpoint that uses it. We want users to be able to send some text to our API and get back an audio file.

Step 1: Define the Route

In your Flask routes file, add the following code:

Let’s explain what happens here:

request.args.get('text', default='', type=str): This gets the text parameter from the URL query string. For example, /api/tts?text=Hello%20world.
If no text is provided, the function returns an error message and a 400 status code.
If text is provided, it calls generate_tts_audio(text) to get the audio.
send_file(audio, mimetype='audio/mpeg') sends the audio back to the user as an MP3 file.

Example Request and Output

If you visit:

You will receive an audio file that says:
“Welcome to your smart cooking assistant.”

Summary and Practice Preview

In this lesson, you learned how to add Text-to-Speech (TTS) support to your Flask API using the gTTS library. You wrote a function to convert text into audio and created a new /api/tts endpoint that returns audio files to users. This makes your cooking assistant more helpful, especially for users who want to listen to recipes while cooking.

Next, you will get a chance to practice generating audio and working with the new endpoint. You will try out different texts and see how the API responds with audio. This hands-on practice will help you become comfortable with adding audio features to your web applications.

Next Lesson: Ingredients and Reviews Endpoints

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal