Welcome back! In the previous lesson, we set up a development environment using a virtual environment and installed the necessary dependencies to interact with the OpenAI API. Today, we're diving into making your first API request using Whisper, which is crucial for creating a transcription system. This builds on your understanding of environment setup and Python scripting, and now we’ll focus on interacting with APIs.
You'll learn to transform audio data into text using the Whisper API.
The Whisper API from OpenAI is designed to handle audio transcription. The core idea is to send audio data to the API, which then returns a transcribed text. This process begins with a valid API key that authenticates your requests. The API interprets byte-stream data from audio files, transcribing what’s spoken into text with varying levels of detail depending on its configuration.
While Whisper handles diverse audio inputs, it primarily focuses on capturing spoken content and might skip non-verbal sounds while ensuring the output is human-readable. The result is a JSON object containing the transcribed text and, sometimes, details like the duration of the audio.
Let’s explore a simple example demonstrating how to make your first transcription request using the Whisper API:
Python1from openai import OpenAI 2 3client = OpenAI() 4 5 6def transcribe_audio(file_path): 7 """ 8 Transcribe an audio file using OpenAI's Whisper API. 9 """ 10 try: 11 with open(file_path, 'rb') as audio_file: 12 transcript = client.audio.transcriptions.create( 13 model="whisper-1", 14 file=audio_file, 15 timeout=60 16 ) 17 return transcript.text 18 except Exception as e: 19 raise Exception(f"Transcription failed: {str(e)}") 20 21if __name__ == "__main__": 22 result = transcribe_audio("resources/sample_audio.mp3") 23 print("Transcription:", result)
This code demonstrates the transcription process:
-
Client Initialization: Instantiate an
openai.OpenAI
client. This client manages your requests to the OpenAI API, authenticated by the previously loaded API key. This client automatically uses theOPENAI_API_KEY
environment variable for authentication. -
File Handling: Open the audio file in binary read mode (
"rb"
). Reading as bytes ensures the data format is suitable for API processing. -
API Call: The
client.audio.transcriptions.create
method submits the audio data for transcription. Themodel
specifies which version of Whisper to use, in this case,"whisper-1"
. Thetimeout
defines how long the request can take before it times out. -
Handling the Response: The API call returns a JSON response. Access the
text
attribute to retrieve the transcribed content, ready for further processing or storage.
Now, that we know how to make an API request to OpenAI, let's try to do some practice! Onward and upward!