Imagine you are cooking and your hands are covered in flour or oil. Instead of stopping to read instructions, wouldn’t it be easier to listen to the next step? Adding audio features to your cooking assistant makes it more accessible and convenient. With Text-to-Speech (TTS) technology, your application can read out recipe steps, making the cooking process smoother and more enjoyable for everyone.
Before we add audio features, let’s review how to define new endpoints in our application. In this project, we use a router to organize our routes. Each route connects a URL path to a function that handles the request and returns a response.
For example, to create a simple endpoint, you might write:
When a user visits /hello, the hello() function runs and returns a message. In our project, routes are organized in a separate file, but the idea is the same: each route defines how your application responds to a specific request.
Text-to-Speech (TTS) technology converts written text into spoken audio. gTTS (Google Text-to-Speech) is a tool that takes your text and generates audio using Google’s TTS engine. It requires an active internet connection, as the text is sent to Google’s servers to produce the audio. This feature is especially useful for users who prefer listening to instructions rather than reading them.
For example, if you want your application to say, “Welcome to your smart cooking assistant!”, gTTS can generate the audio for you.
Let’s create a function that turns text into audio. We will use the gTTS tool to convert text to speech and an in-memory file to hold the audio data. Here’s how you can do it step by step:
First, import the necessary classes:
gTTSis used to convert text into speech.BytesIOallows us to work with audio data in memory, without saving it to a file.
Now, write a function that takes some text and returns the audio data:
Here’s what each part does:
tts = gTTS(text): Creates a TTS object with the text you want to convert.audio_io = BytesIO(): Creates an in-memory file to hold the audio data.tts.write_to_fp(audio_io): Writes the audio data into the in-memory file.audio_io.seek(0): Moves the pointer to the start of the audio data.return audio_io: Returns the in-memory audio file, ready to be sent as a response.
This function keeps everything in memory, making it fast and efficient for web APIs.
Now that we have a function to generate audio, let’s create an endpoint that uses it. This endpoint will allow users to send text and receive an audio file in response.
In your routes file, add the following code:
Let’s break down what happens here:
text: str = Query(""): Gets thetextparameter from the URL query string, such as/api/tts?text=Hello%20world.- If no text is provided, the function returns an error with a 400 status code.
- If text is provided, it calls
generate_tts_audio(text)to get the audio. - When the audio is generated successfully, it is returned as a
StreamingResponse. If not, aHTTPExceptionis raised.
If you visit:
You will receive an audio file that says:
“Welcome to your smart cooking assistant.”
In this lesson, you learned how to add Text-to-Speech (TTS) support to your API using the gTTS tool. You wrote a function to convert text into audio and created a new /api/tts endpoint that returns audio files to users. This makes your cooking assistant more helpful, especially for users who want to listen to recipes while cooking.
Next, you will get a chance to practice generating audio and working with the new endpoint. You will try out different texts and see how the API responds with audio. This hands-on practice will help you become comfortable with adding audio features to your web applications.
