Adding Audio to APIs

Introduction: Why Add Audio to Our Cooking API?

Welcome to your first step in building an AI Cooking Helper with TypeScript and Express! In this lesson, you will learn how to add audio features to your API using Text-to-Speech (TTS) technology. Imagine you are cooking and your hands are messy — being able to listen to recipe steps instead of reading them can make the process much easier and more enjoyable. By the end of this lesson, you will know how to generate audio from text and serve it through your API, making your cooking assistant more helpful and accessible.

Getting to Know gTTS (Google Text-to-Speech)

Now, let’s talk about gTTS. gTTS stands for Google Text-to-Speech. It is a library that takes text and turns it into spoken audio using Google’s TTS engine. In Node.js, you can use the gtts package to access this functionality. Important: gTTS sends your text to Google’s servers to generate the audio, so it requires an active internet connection. If your server or development environment is offline, gTTS will not work.

This is useful for making your app more accessible, especially for users who prefer listening over reading.

To use gtts in your project, you can install it with:

In the Codesignal environment, this library will already be installed, there is no need to run this command.

Writing a Function to Generate Audio from Text

Let’s build the function that will turn text into audio. We will use the gTTS class from the gtts package and Node.js streams. Here’s how we do it, step by step:

Step 1: Import the Required Libraries

First, we need to import the classes we will use:

gTTS is the main class for converting text to speech.
Readable is used to work with audio data as a stream, which is efficient for web APIs.

Step 2: Create the Function

Now, let’s write the function that takes some text and returns a readable audio stream:

Let’s break this down:

const tts = new gTTS(text);: This creates a gTTS object with the text you want to convert.
We try to get a readable stream from the gTTS object. If available, we return it directly.
The function returns a readable stream, which can be sent directly to the client.

This function does not save any files to disk. It keeps everything in memory, which is fast and efficient for web APIs.

Building the /api/tts Endpoint

Now that we have a function to generate audio, let’s create an API endpoint that uses it. We want users to be able to send some text to our API and get back an audio file.

Step 1: Define the Route

In your Express routes file, add the following code:

Let’s explain what happens here:

const text = String(req.query.text ?? "");: This gets the text parameter from the URL query string. For example, /api/tts?text=Hello%20world.
If no text is provided, the function returns an error message and a 400 status code.
If text is provided, it calls ttsStream(text) to get the audio stream.
stream.pipe(res); streams the audio back to the user as an MP3 file.

Example Request and Output

If you visit:

You will receive an audio file that says:
“Welcome to your smart cooking assistant.”

Summary and Practice Preview

In this lesson, you learned how to add Text-to-Speech (TTS) support to your Express API using TypeScript and the gtts library. You wrote a function to convert text into audio streams and created a new /api/tts endpoint that returns audio files to users. This makes your cooking assistant more helpful, especially for users who want to listen to recipes while cooking.

Next, you will get a chance to practice generating audio and working with the new endpoint. You will try out different texts and see how the API responds with audio. This hands-on practice will help you become comfortable with adding audio features to your web applications.

Next Lesson: Listing Ingredients and Reviews

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal