Generating Video Summary

Welcome back! In the previous lessons, we explored transcribing videos using the Whisper API and downloading them via both Google Drive and LinkedIn. Building on those skills, we're now going to delve deeper into generating video summaries, an essential skill for transforming lengthy transcriptions into concise and insightful content. This lesson takes you a step further by utilizing the capabilities of OpenAI's API to create detailed yet succinct summaries on the fly.

What You'll Learn

Today, you will:

  • Generate summaries from video transcriptions.
  • Use the OpenAI API to create structured content summaries.
  • Understand the flow between frontend interaction and backend logic.
  • Design effective system and user prompts for consistent summarization.
Overview: The Summary Feature

Summarizing transcriptions involves distilling the core messages from extensive spoken content, ensuring key points are retained while unnecessary details are filtered out. When dealing with long videos or lectures, extracting the main themes allows you to quickly grasp the essentials without listening to every word. The OpenAI API facilitates this by leveraging advanced language models capable of understanding context and summarizing long texts.

This feature allows users to click an "Analyze" button on the frontend, triggering a backend flow that:

  1. Clips the first 30 seconds of the video.
  2. Transcribes that audio.
  3. Uses OpenAI’s GPT model to summarize the transcription.
  4. Returns both the raw transcription and the summary to the frontend for display.
Summarization Logic: contentAnalyzer.ts

This function communicates with OpenAI's API to transform raw transcription into a well-formatted summary.

  • OpenAI Client Initialization: new OpenAI() reads the API key from environment variables.
  • Function Input: Accepts text, the full transcript string.
  • Chat Completion Payload:
    • model: "gpt-4o": This is the fastest, most capable model suited for summarization.
    • messages[]:
      • System Message: Sets expectations for the assistant, describing its role and formatting instructions.
      • User Message: Contains the transcription text prefaced with task instructions.
  • Return Value: The summary is accessed via response.choices[0].message.content.
Understanding `messages: [...]` in `chat.completions.create`

The messages array is a critical component of any OpenAI chat-based interaction. It defines a conversational context that guides the model’s behavior. Each message in this array has three keys:

  • role: Who is "speaking"—can be "system", "user", or "assistant".
  • content: What that role says to the model (instructions, input, etc.).
  • name (optional): Used in multi-user contexts.
role: "system" — Behavior Configuration
  • Purpose: The system message sets the rules and persona the model should adopt. Think of it as configuring the "mindset" or "job description" of the AI.
  • Content: This message tells the model it is a skilled summarizer, lists expected capabilities, and enforces a formatting style:
    • Start with a one-sentence overview
    • Follow with key takeaways
    • Optionally include quotes or important details
role: "user" — Task Instruction + Input
  • Purpose: The user message simulates what a human would ask the assistant to do. This includes both task instructions and data input (in this case, the transcription).
  • Content:
    • Directly tells the model to “Create a structured summary”.
    • Repeats key expectations (“focus on core message”, “maintain context”).
    • Embeds the actual transcript at the end.
      In this function, we provide two messages—system and user—to control the model’s behavior and provide it with the transcript to summarize.
Frontend Trigger: app.js

On the client side, user interaction begins when the "Analyze" button is clicked. This event triggers the summary request pipeline. Let’s examine the complete logic:

  • analyzeBtn.addEventListener('click', async () => { ... }) attaches an event listener to the "Analyze" button. When clicked, it triggers an asynchronous function.
  • if (!currentVideoPath) checks whether a video has been loaded. If not, it calls showError() to alert the user. This ensures the user doesn’t trigger a backend call without any content to analyze.
  • updateLoadingState(true, ...) shows a loading spinner or message indicating the app is working. This improves UX and prevents confusion during longer waits.
  • analyzeBtn.disabled = true prevents multiple clicks while the current request is processing.
  • fetch('/transcribe', { ... }) sends a POST request to the backend endpoint. It includes:
    • : specifying the content type as JSON.
Backend Entry Point: transcribe.ts

This is the backend Express route responsible for handling transcription and summarization. Here's the full logic:

  • req.body.file_path: The frontend sends the video’s path in the request body. This is validated at the start.
  • path.join(process.cwd(), 'downloads', file_path) constructs the full path to the file on disk. It assumes videos are stored in a downloads folder.
  • File existence check: If the file doesn’t exist, a 404 error is returned.
  • Audio clipping with ffmpeg:
    • The command captures the first 30 seconds.
    • -ac 1: mono audio.
Lesson Summary

You’ve now completed the logic to turn transcription text into structured summaries. From clicking the frontend button to summarizing via OpenAI’s model and receiving a concise, formatted output—every step is clearly mapped out and modular. You’ve also learned how to create effective prompts, handle video clipping with ffmpeg, and display results seamlessly in the UI. This design is robust and extensible, paving the way for deeper analysis features in future lessons.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal