Welcome to your first step in building an AI Cooking Helper with Flask
! In this lesson, you will learn how to add audio features to your API using Text-to-Speech (TTS) technology. Imagine you are cooking and your hands are messy — being able to listen to recipe steps instead of reading them can make the process much easier and more enjoyable. By the end of this lesson, you will know how to generate audio from text and serve it through your API, making your cooking assistant more helpful and accessible.
Before we dive into audio, let’s quickly remind ourselves how we add new endpoints in a Flask
application. In previous lessons (or if you are new, just know this is the basic pattern), we use the @app.route
decorator to define a new route. For example:
Here, when a user visits /hello
, the function hello()
runs and returns a simple message. In our project, we use Blueprints and organize routes in a separate file, but the idea is the same: each route connects a URL to a function that returns a response.
Now, let’s talk about gTTS. gTTS
stands for Google Text-to-Speech. It is a Python library that takes text and turns it into spoken audio using Google’s TTS engine. Important: gTTS sends your text to Google’s servers to generate the audio, so it requires an active internet connection. If your server or development environment is offline, gTTS will not work.
This is useful for making your app more accessible, especially for users who prefer listening over reading.
For example, if you want to turn the text “Welcome to your smart cooking assistant!” into audio, gTTS
can do that in just a few lines of code.
You do not need to install gTTS
on CodeSignal, as it is already available. However, if you want to use it on your own computer, you would install it with:
Let’s build the function that will turn text into audio. We will use the gTTS
class from the gtts
library and the BytesIO
class from Python’s io
module. Here’s how we do it, step by step:
First, we need to import the classes we will use:
gTTS
is the main class for converting text to speech.BytesIO
lets us work with audio data in memory, without saving it to a file.
Now, let’s write the function that takes some text and returns the audio data:
Let’s break this down:
tts = gTTS(text)
: This creates agTTS
object with the text you want to convert.audio_io = BytesIO()
: This creates an in-memory file to hold the audio data.tts.write_to_fp(audio_io)
: This writes the audio data into theaudio_io
object.audio_io.seek(0)
: This moves the pointer to the start of the audio data, so it can be read from the beginning.return audio_io
: This returns the in-memory audio file, ready to be sent as a response.
This function does not save any files to disk. It keeps everything in memory, which is fast and efficient for web APIs.
Now that we have a function to generate audio, let’s create an API endpoint that uses it. We want users to be able to send some text to our API and get back an audio file.
In your Flask routes file, add the following code:
Let’s explain what happens here:
request.args.get('text', default='', type=str)
: This gets thetext
parameter from the URL query string. For example,/api/tts?text=Hello%20world
.- If no text is provided, the function returns an error message and a 400 status code.
- If text is provided, it calls
generate_tts_audio(text)
to get the audio. send_file(audio, mimetype='audio/mpeg')
sends the audio back to the user as an MP3 file.
If you visit:
You will receive an audio file that says:
“Welcome to your smart cooking assistant.”
In this lesson, you learned how to add Text-to-Speech (TTS) support to your Flask
API using the gTTS
library. You wrote a function to convert text into audio and created a new /api/tts
endpoint that returns audio files to users. This makes your cooking assistant more helpful, especially for users who want to listen to recipes while cooking.
Next, you will get a chance to practice generating audio and working with the new endpoint. You will try out different texts and see how the API responds with audio. This hands-on practice will help you become comfortable with adding audio features to your web applications.
