Playback Using Howler.js and Transcription Timing Logic

Full Howler Transcribe App Integration – Interactive Transcription UI

Welcome to the final lesson of this course! So far, you’ve built a robust foundation:

In the first unit, you implemented audio playback using Howler.js and controlled it through backend routes.
In the second unit, you learned how to clip audio segments and transcribe them using the OpenAI Whisper API.

Now, it’s time to bring everything together into a seamless browser experience. In this lesson, you'll learn how to:

Track when the user starts and stops playback
Send the correct audio segment to the backend
Display the transcription result in the browser

Let’s build the full interactive workflow!

What You’ll Learn

By the end of this lesson, you’ll be able to:

Capture the start and stop time of playback using Howler.js
Send the segment info to your backend
Transcribe and display the result directly in the frontend

This is the final version of your app. After this, you’ll have a fully working client-server audio transcription tool.

Tracking Start and Stop Times

To capture an audio segment, we need to know when the user wants to begin and end recording. We’ll use Howler.js’s seek() method to get the current playback position in seconds.

📌 `startRecording()`

Explanation:

This function is triggered when the user clicks Start Recording.
It checks if the audio is playing and captures the current playback time.
This becomes the segment start.

📌 `stopRecordingAndTranscribe()`

Explanation: The stopRecordingAndTranscribe function is the final step in capturing and transcribing a user-selected audio segment. It’s designed to coordinate timing logic, perform input validation, and initiate communication with the backend transcription route—all within a user-friendly interface.

Let’s break it down:

When the user clicks Stop + Transcribe, the current playback position is used as the segment end.
We compute the duration and send the segment details to /transcribe.
Once the backend returns the text, it’s displayed in the UI.

1. Validating State and Duration

Before doing anything, the function checks whether playback is active and a recording has been started. Then, it calculates the duration of the recorded segment by comparing the current playback position with the saved start time.

This prevents transcription of zero-length or negative-duration segments.

2. Preparing the UI and Sending the Request

Once validated, the function displays a loading message and sends a POST request to /transcribe, passing the selected file path, segment start time, and duration.

This data enables the backend to clip the audio precisely and send it to the Whisper API.

3. Displaying the Result

After receiving the response, the function updates the UI with the returned transcription (or a fallback message if it's empty), giving the user immediate feedback.

Finally, it resets playbackStartSec to prepare for the next session. This method is the glue between playback tracking and the backend transcription system. It ensures segments are valid, initiates server communication, and neatly updates the interface with the result—making it a key part of the full app experience.

Tracking Playback Progress with lastKnownPosition

In the previous lesson, we started monitoring the current audio position using Howler.js. That logic continues here—but now it plays a critical role in accurately determining when a user stops recording. But Where Does lastKnownPosition Come From?

The lastKnownPosition variable is continuously updated in real time during playback using Howler's seek() method. We set up this logic when the user presses Play:

The seek() function returns the current playback time (in seconds), but it only gives a snapshot—not continuous updates. So we manually poll seek() every 200ms and store the result in lastKnownPosition.

This means:

When the user presses Stop + Transcribe, lastKnownPosition holds the most recent timestamp.
This value is used as the segment end time, while playbackStartSec (recorded earlier) is the segment start.

Without this interval-based polling, we wouldn’t have an accurate playbackEndSec —leading to either zero-length segments or incorrect transcriptions.

Hooking It All Up

Here’s how the final button setup looks:

In your index.html, make sure these buttons are visible (uncommented) and that the transcription result panel is styled correctly.

Final Workflow Recap

Let’s summarize everything that’s now working in your full app:

Select file: /api/files returns list of .mp3 files
Playback control: /audio/play, /pause, /stop updates playback state
Play audio: Howler.js streams audio in the browser
Start recording: Save timestamp using seek()
Stop and transcribe: Calculate duration, send to /transcribe
Backend: Clips the audio, sends it to Whisper, returns transcription
Frontend: Displays the result in real time

You now have a fully functioning browser-based audio transcription tool, powered by Howler.js, TypeScript, and OpenAI Whisper.

Summary

In this final lesson, you learned how to:

Track start and end times using Howler.js
Capture and transcribe just the part of the audio the user listened to
Build a smooth, interactive frontend experience for transcription

This completes your interactive app—and your learning journey for this course!

Fantastic work. You’ve built something powerful and practical. See you in the next adventure! 🛰️

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal