Making Your First Whisper API Request

Welcome back! In the previous lesson, we set up a TypeScript development environment and installed the necessary dependencies to interact with the OpenAI API. Today, we're diving into making your first API request using Whisper, which is crucial for creating a transcription system. This builds on your understanding of environment setup and TypeScript, and now we'll focus on interacting with APIs.

You'll learn how to transform audio data into text using the Whisper API.

Understanding Making Your First Whisper API Request

The Whisper API from OpenAI is designed to handle audio transcription. The core idea is to send audio data to the API, which then returns transcribed text. This process begins with a valid API key that authenticates your requests. The API interprets byte-stream data from audio files, transcribing what's spoken into text with varying levels of detail depending on its configuration.

While Whisper handles diverse audio inputs, it primarily focuses on capturing spoken content and might skip non-verbal sounds while ensuring the output is human-readable. The result is a JSON object containing the transcribed text and, sometimes, details like the duration of the audio.

Using `transcriber.ts` for Audio Transcription

Now it's time to see how we actually transcribe audio using OpenAI's Whisper API.

Inside your project, there's a file named transcriber.ts. This is where the transcription logic lives. Here's what it does:

  • Initializes the OpenAI client.
  • Accepts a file path to an audio file.
  • Reads the file as a stream using Node’s fs module.
  • Sends the audio file to OpenAI’s Whisper model.
  • Returns the transcribed text.
Making Your First API Request

Let's explore a simple example demonstrating how to make your first transcription request using the Whisper API with TypeScript:

This code demonstrates the transcription process:

  • Client Initialization: We instantiate an OpenAI client from the imported package. This client manages your requests to the OpenAI API, authenticated by the API key stored in your environment variables. The client automatically uses the OPENAI_API_KEY environment variable for authentication.

  • File Handling: We use Node.js's fs/promises module to asynchronously read the audio file as a Buffer. This approach leverages TypeScript's async/await pattern for cleaner, more readable code.

  • API Call: The client.audio.transcriptions.create method submits the audio data for transcription. We pass an object with configuration options, including the model (which specifies which version of Whisper to use, in this case, "whisper-1") and a value that defines how long the request can take before it times out.

Moving On To Practice

Now that we know how to make an API request to OpenAI using TypeScript, let's try some practice! Onward and upward!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal