Introduction: Why Add Audio to Our Cooking API?

Imagine you are cooking and your hands are covered in flour or oil. Instead of stopping to read instructions, wouldn’t it be easier to listen to the next step? Adding audio features to your cooking assistant makes it more accessible and convenient. With Text-to-Speech (TTS) technology, your application can read out recipe steps, making the cooking process smoother and more enjoyable for everyone.

Recall: How We Add Endpoints

Before we add audio features, let’s review how to define new endpoints in our application. In this project, we use a router to organize our routes. Each route connects a URL path to a function that handles the request and returns a response.

For example, to create a simple endpoint, you might write:

When a user visits /hello, the hello() function runs and returns a message. In our project, routes are organized in a separate file, but the idea is the same: each route defines how your application responds to a specific request.

Getting to Know gTTS (Google Text-to-Speech)

Text-to-Speech (TTS) technology converts written text into spoken audio. gTTS (Google Text-to-Speech) is a tool that takes your text and generates audio using Google’s TTS engine. It requires an active internet connection, as the text is sent to Google’s servers to produce the audio. This feature is especially useful for users who prefer listening to instructions rather than reading them.

For example, if you want your application to say, “Welcome to your smart cooking assistant!”, gTTS can generate the audio for you.

Let’s create a function that turns text into audio. We will use the gTTS tool to convert text to speech and an in-memory file to hold the audio data. Here’s how you can do it step by step:

Step 1: Import the Required Tools

First, import the necessary classes:

  • gTTS is used to convert text into speech.
  • BytesIO allows us to work with audio data in memory, without saving it to a file.
Step 2: Create the Function

Now, write a function that takes some text and returns the audio data:

Here’s what each part does:

  • tts = gTTS(text): Creates a TTS object with the text you want to convert.
  • audio_io = BytesIO(): Creates an in-memory file to hold the audio data.
  • tts.write_to_fp(audio_io): Writes the audio data into the in-memory file.
  • audio_io.seek(0): Moves the pointer to the start of the audio data.
  • return audio_io: Returns the in-memory audio file, ready to be sent as a response.

This function keeps everything in memory, making it fast and efficient for web APIs.

Building the /api/tts Endpoint

Now that we have a function to generate audio, let’s create an endpoint that uses it. This endpoint will allow users to send text and receive an audio file in response.

In your routes file, add the following code:

Let’s break down what happens here:

  • text: str = Query(""): Gets the text parameter from the URL query string, such as /api/tts?text=Hello%20world.
  • If no text is provided, the function returns an error with a 400 status code.
  • If text is provided, it calls generate_tts_audio(text) to get the audio.
  • When the audio is generated successfully, it is returned as a StreamingResponse. If not, a HTTPException is raised.
Example Request and Output

If you visit:

You will receive an audio file that says:
“Welcome to your smart cooking assistant.”

Summary and Practice Preview

In this lesson, you learned how to add Text-to-Speech (TTS) support to your API using the gTTS tool. You wrote a function to convert text into audio and created a new /api/tts endpoint that returns audio files to users. This makes your cooking assistant more helpful, especially for users who want to listen to recipes while cooking.

Next, you will get a chance to practice generating audio and working with the new endpoint. You will try out different texts and see how the API responds with audio. This hands-on practice will help you become comfortable with adding audio features to your web applications.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal