Real-Time Microphone Transcription (Live Simulation)

In this lesson, we’re enhancing our transcription system to work like a live microphone transcription tool. Instead of recording the entire audio before transcribing, we now record short chunks (3 seconds each) and transcribe them one-by-one as they arrive—simulating a live transcription experience directly in the browser.


What You Will Learn

This unit covers:

  • How to capture short audio snippets (chunks) from the user's microphone in real time.
  • How to transcribe each audio chunk immediately after recording.
  • How to update the UI with live transcription results.
  • How to manage a recording session with duration limits and countdown timers.

Frontend: Simulating Live Microphone Transcription

We'll begin with public/app.js, where we configure how microphone input is handled in real time.

These constants are critical for timing and quality control:

  • mimeType: This tells the MediaRecorder what format to use. audio/webm;codecs=opus specifies WebM format with the Opus codec, which is well-suited for audio and supported by Whisper.
  • CHUNK_DURATION: Each recording session will be sliced into 3-second pieces.
  • MAX_CHUNKS: Limits the session to 10 chunks (to simulate ~30s cap).
  • MAX_TIME_S: Converts chunk duration * number of chunks into seconds for UI display.
  • chunkCount & remainingTime: Track session state and countdown for the user.

Managing Chunk Loop
  • recordChunk() is the main function that performs all recording logic for a single audio segment. We will discuss in a separate section below.
  • setInterval: Automatically runs recordChunk() every 3 seconds.
  • We also invoke recordChunk() immediately to avoid waiting for the first interval.
  • clearInterval(intervalId): Essential for stopping the session; otherwise, recording will continue indefinitely even if the user presses stop.

Countdown Timer

A simple helper that updates the visible timer on the screen using the remainingTime variable.


Chunk Recording Logic

Now let’s dive into the heart of this simulation: the recordChunk() function.

  • This function is responsible for executing one full iteration of the record → upload → transcribe cycle. Every 3 seconds, it does the following:
  • Requests microphone access to capture a short audio stream.
  • Records exactly one chunk using the browser's MediaRecorder API.
  • Packages the audio data into a Blob for upload.
  • Sends the chunk to the backend, where it’s temporarily stored.
  • Initiates transcription by sending the uploaded file to the Whisper API.
  • Appends the returned text to the live transcript on the UI.

This structure enables us to transcribe small segments in near-real-time, giving users immediate feedback as they speak. By repeating this function on an interval, we simulate continuous live transcription, without needing a streaming connection. Let’s break it down:

This line is crucial—it asks the browser for access to the user’s microphone using navigator.mediaDevices.getUserMedia.

  • The audio object specifies a high-quality mono stream:
    • sampleRate: 44100: CD-quality audio.
    • channelCount: 1: Mono (single channel).
    • noiseSuppression: Reduces background noise.
    • echoCancellation: Removes speaker echo (common in browser mic recordings).
When Chunk Stops: Upload + Transcribe
  • Blob: A binary large object that packages all audio data into a single file.
  • FormData: Simulates a form submission to send binary files over HTTP.
  • formData.append(): Adds the blob to the form data under the key 'audio', with a filename.
  • The audio blob is sent to the backend /recordings/upload route.
  • We retrieve the server-side file path of the uploaded chunk.

  • We pass the file path to the /transcribe endpoint.
  • Transcription text is returned and added to the live transcript in the UI.

Limit & Cleanup
  • Increments counters and refreshes the session timer.
  • Automatically stops recording once max chunks are reached.
  • Cleans up the mic stream with getTracks().forEach(track => track.stop()).

Utility: Append Transcript

Adds each transcribed chunk as it’s returned from the server.


UI Handlers: Start and Stop Buttons
  • Starts a fresh session and resets all states and UI indicators.

  • Gracefully ends a session and restores UI defaults.

Backend Logic (No Change Required)

The backend from the previous unit continues to work seamlessly:

  • /recordings/upload: stores each .webm chunk.
  • /transcribe: invokes transcribe() to convert uploaded audio into text using OpenAI’s Whisper API.

Summary

In this unit, you:

  • Simulated live microphone transcription using 3-second audio chunks.
  • Learned to manage a transcription session with time and chunk limits.
  • Processed and displayed each chunk’s transcript live in the browser.
  • Built a scalable transcription pipeline with clean UI feedback and Whisper API integration.

Next up: we’ll expand on this to support long-form recordings with advanced segmentation and context-aware processing.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal