In this lesson, we’re enhancing our transcription system to work like a live microphone transcription tool. Instead of recording the entire audio before transcribing, we now record short chunks (3 seconds each) and transcribe them one-by-one as they arrive—simulating a live transcription experience directly in the browser.
This unit covers:
- How to capture short audio snippets (chunks) from the user's microphone in real time.
- How to transcribe each audio chunk immediately after recording.
- How to update the UI with live transcription results.
- How to manage a recording session with duration limits and countdown timers.
We'll begin with public/app.js, where we configure how microphone input is handled in real time.
These constants are critical for timing and quality control:
mimeType: This tells theMediaRecorderwhat format to use.audio/webm;codecs=opusspecifies WebM format with the Opus codec, which is well-suited for audio and supported by Whisper.CHUNK_DURATION: Each recording session will be sliced into 3-second pieces.MAX_CHUNKS: Limits the session to 10 chunks (to simulate ~30s cap).MAX_TIME_S: Converts chunk duration * number of chunks into seconds for UI display.chunkCount&remainingTime: Track session state and countdown for the user.
recordChunk()is the main function that performs all recording logic for a single audio segment. We will discuss in a separate section below.setInterval: Automatically runsrecordChunk()every 3 seconds.- We also invoke
recordChunk()immediately to avoid waiting for the first interval. clearInterval(intervalId): Essential for stopping the session; otherwise, recording will continue indefinitely even if the user presses stop.
A simple helper that updates the visible timer on the screen using the remainingTime variable.
Now let’s dive into the heart of this simulation: the recordChunk() function.
- This function is responsible for executing one full iteration of the record → upload → transcribe cycle. Every 3 seconds, it does the following:
- Requests microphone access to capture a short audio stream.
- Records exactly one chunk using the browser's
MediaRecorderAPI. - Packages the audio data into a
Blobfor upload. - Sends the chunk to the backend, where it’s temporarily stored.
- Initiates transcription by sending the uploaded file to the Whisper API.
- Appends the returned text to the live transcript on the UI.
This structure enables us to transcribe small segments in near-real-time, giving users immediate feedback as they speak. By repeating this function on an interval, we simulate continuous live transcription, without needing a streaming connection. Let’s break it down:
This line is crucial—it asks the browser for access to the user’s microphone using navigator.mediaDevices.getUserMedia.
- The
audioobject specifies a high-quality mono stream:sampleRate: 44100: CD-quality audio.channelCount: 1: Mono (single channel).noiseSuppression: Reduces background noise.echoCancellation: Removes speaker echo (common in browser mic recordings).
Blob: A binary large object that packages all audio data into a single file.FormData: Simulates a form submission to send binary files over HTTP.formData.append(): Adds the blob to the form data under the key'audio', with a filename.
- The audio blob is sent to the backend
/recordings/uploadroute. - We retrieve the server-side file path of the uploaded chunk.
- We pass the file path to the
/transcribeendpoint. - Transcription text is returned and added to the live transcript in the UI.
- Increments counters and refreshes the session timer.
- Automatically stops recording once max chunks are reached.
- Cleans up the mic stream with
getTracks().forEach(track => track.stop()).
Adds each transcribed chunk as it’s returned from the server.
- Starts a fresh session and resets all states and UI indicators.
- Gracefully ends a session and restores UI defaults.
The backend from the previous unit continues to work seamlessly:
/recordings/upload: stores each.webmchunk./transcribe: invokestranscribe()to convert uploaded audio into text using OpenAI’s Whisper API.
In this unit, you:
- Simulated live microphone transcription using 3-second audio chunks.
- Learned to manage a transcription session with time and chunk limits.
- Processed and displayed each chunk’s transcript live in the browser.
- Built a scalable transcription pipeline with clean UI feedback and Whisper API integration.
Next up: we’ll expand on this to support long-form recordings with advanced segmentation and context-aware processing.
