Welcome back! In the previous lesson, we set up a development environment using Node.js
and TypeScript
and installed the necessary dependencies to interact with the OpenAI API. Today, we're diving into making your first API request using Whisper, which is crucial for creating a transcription system. This builds on your understanding of environment setup and TypeScript scripting, and now we'll focus on interacting with APIs.
You'll learn to transform audio data into text using the Whisper API.
The Whisper API from OpenAI is designed to handle audio transcription. The core idea is to send audio data to the API, which then returns a transcribed text. This process begins with a valid API key that authenticates your requests. The API accepts various audio file formats (like mp3, wav, m4a) and returns the transcribed text. While Whisper handles diverse audio inputs, it primarily focuses on capturing spoken content and might skip non-verbal sounds while ensuring the output is human-readable.
Let's explore a simple example demonstrating how to make your first transcription request using the Whisper API with TypeScript
:
This code demonstrates the transcription process:
-
OpenAI Client Setup: We initialize the OpenAI client with an implicit API key taken from environment variables.
-
File Handling: We use
fs.createReadStream
to create a readable stream of the audio file, which is the recommended way to send files to the API. -
API Call: Using the OpenAI SDK, we call
audio.transcriptions.create()
with our audio file and specify the Whisper model ('whisper-1'). The SDK handles all the necessary HTTP headers and request formatting for us. -
Response Handling: The API returns an object containing the transcribed text, which we can access directly through the
text
property.
Now that we know how to make an API request to OpenAI, let's try to do some practice! Onward and upward!
