Welcome back! With your C# environment set up and a successful request to the GPT-4o model under your belt, it's time to delve into OpenAI’s transcription capabilities. OpenAI offers several models for transcribing audio files into text, each tailored for specific use cases such as voice note transcription, meeting summaries, and accessibility solutions.
In this lesson, you'll learn how to send an audio transcription request using OpenAI's transcription models in C#. By the end, you'll be equipped to send an audio file to the API and print the resulting text, bringing you closer to building real-world speech processing applications.
OpenAI provides multiple models for audio transcription, each with distinct features and advantages. Choosing the right model depends on your needs for accuracy, speed, cost, and metadata. Here’s a comparison of the primary models you can use for transcription:
For most general-purpose transcription tasks, gpt-4o-transcribe
offers a strong balance of speed, cost, and accuracy. If you need detailed metadata like word-level timestamps, whisper-1
is the better choice. For scenarios where speed and cost are the highest priority, gpt-4o-mini-transcribe
is ideal.
With your environment and credentials ready, let's focus on making a transcription request. You'll use one of OpenAI's transcription models by creating an AudioClient
, specifying your audio file, sending the file to the API, and displaying the transcription.
Let's break down the process into three key parts:
First, create an AudioClient
for the transcription model, providing your credentials and any client options you set up previously. Replace "transcription-model"
with your chosen model, such as "gpt-4o-transcribe"
or "whisper-1"
:
This sets up the client that will handle communication with the transcription API.
Next, specify the path to the audio file you want to transcribe. This example assumes your audio file is located in an Assets
folder:
Note: You can also provide a video file (such as
.mp4
,.mov
, or.avi
) instead of an audio file. The API will automatically extract the audio track from the video and transcribe it, ignoring the video content. Just specify the path to your video file in place of the audio file.
This ensures the file path is constructed correctly, regardless of your operating system.
Finally, send the audio file to the API for transcription and print the resulting text:
The TranscribeAudio()
method sends your audio file to the model and returns an AudioTranscription
object. The transcribed text is available via the Text
property.
If everything is set up correctly and the audio file exists at the specified path, you will see output like:
If you encounter issues, check your environment variables, verify the audio file path, and ensure the file exists. Errors often provide clues about missing files or incorrect credentials.
In this lesson, you wrote code to send an audio file to OpenAI's transcription API and printed the transcribed text in your C# application. This is a foundational skill for any application involving speech-to-text capabilities.
You're now ready to practice making transcription requests and explore more features of OpenAI's transcription models. In the upcoming practice section, you'll reinforce these skills hands-on and gain confidence working with audio in C#. Excellent work getting here—transcribing audio with OpenAI's models opens up powerful new possibilities for your projects!
