Introduction and Lesson Overview

Welcome back! With your C# environment set up and a successful request to the GPT-4o model under your belt, it's time to delve into OpenAI’s transcription capabilities. OpenAI offers several models for transcribing audio files into text, each tailored for specific use cases such as voice note transcription, meeting summaries, and accessibility solutions.

In this lesson, you'll learn how to send an audio transcription request using OpenAI's transcription models in C#. By the end, you'll be equipped to send an audio file to the API and print the resulting text, bringing you closer to building real-world speech processing applications.

Understanding OpenAI's Transcription Models

OpenAI provides multiple models for audio transcription, each with distinct features and advantages. Choosing the right model depends on your needs for accuracy, speed, cost, and metadata. Here’s a comparison of the primary models you can use for transcription:

ModelAccuracy & RobustnessMetadata SupportCost & SpeedIdeal Use Cases
gpt-4o-transcribeHigh accuracy, especially in noisy environments and with diverse accents. Lower Word Error Rate (WER) across multiple languages and acoustic conditions.Limited metadata.Generally cheaper and faster than Whisper.Customer support, meeting transcriptions, real-time applications.
gpt-4o-mini-transcribeOptimized for speed and cost, with slightly reduced accuracy compared to gpt-4o-transcribe.Limited metadata.More affordable and faster, suitable for applications with budget constraints.Quick-response apps, live captioning, budget-sensitive scenarios.
whisper-1Robust performance across various languages and accents.Provides detailed metadata, including word-level timestamps and segments.Slightly higher cost and latency compared to GPT-4o models.Applications requiring detailed metadata, such as language learning tools and detailed analytics.

For most general-purpose transcription tasks, gpt-4o-transcribe offers a strong balance of speed, cost, and accuracy. If you need detailed metadata like word-level timestamps, whisper-1 is the better choice. For scenarios where speed and cost are the highest priority, gpt-4o-mini-transcribe is ideal.

Implementing the Transcription Request

With your environment and credentials ready, let's focus on making a transcription request. You'll use one of OpenAI's transcription models by creating an AudioClient, specifying your audio file, sending the file to the API, and displaying the transcription.

Let's break down the process into three key parts:

1. Create the Transcription Client

First, create an AudioClient for the transcription model, providing your credentials and any client options you set up previously. Replace "transcription-model" with your chosen model, such as "gpt-4o-transcribe" or "whisper-1":

This sets up the client that will handle communication with the transcription API.

2. Specify the Audio File

Next, specify the path to the audio file you want to transcribe. This example assumes your audio file is located in an Assets folder:

Note: You can also provide a video file (such as .mp4, .mov, or .avi) instead of an audio file. The API will automatically extract the audio track from the video and transcribe it, ignoring the video content. Just specify the path to your video file in place of the audio file.

This ensures the file path is constructed correctly, regardless of your operating system.

3. Perform Transcription and Print the Result

Finally, send the audio file to the API for transcription and print the resulting text:

The TranscribeAudio() method sends your audio file to the model and returns an AudioTranscription object. The transcribed text is available via the Text property.

If everything is set up correctly and the audio file exists at the specified path, you will see output like:

If you encounter issues, check your environment variables, verify the audio file path, and ensure the file exists. Errors often provide clues about missing files or incorrect credentials.

Summary and Next Steps

In this lesson, you wrote code to send an audio file to OpenAI's transcription API and printed the transcribed text in your C# application. This is a foundational skill for any application involving speech-to-text capabilities.

You're now ready to practice making transcription requests and explore more features of OpenAI's transcription models. In the upcoming practice section, you'll reinforce these skills hands-on and gain confidence working with audio in C#. Excellent work getting here—transcribing audio with OpenAI's models opens up powerful new possibilities for your projects!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal