Speeding Up Workflows with Parallelization

Introduction & Context

Welcome back! You've mastered sequential workflows with prompt chaining and conditional workflows with intelligent routing. Now it's time to unlock dramatic performance improvements by learning parallel processing — executing multiple independent OpenAI API calls simultaneously instead of waiting for each one to complete.

In this lesson, you'll discover how to transform workflows that take minutes into operations that complete in seconds. You'll learn the difference between synchronous and asynchronous programming, master the asyncio library, and build a system that asks multiple questions to GPT-5 at the same time.

What "Synchronous" Really Means

Before we dive into the technical details, let's clarify terms that often confuse beginners: synchronous and asynchronous.

Outside of programming, "synchronous" means "happening at the same time" — think synchronized swimming or clocks ticking together. In programming, though, "synchronous" means the opposite: operations are coordinated in sequence, not in parallel.

When we say "synchronous API calls," we mean calls that happen one after another, waiting for each to complete before starting the next. "Asynchronous" API calls, on the other hand, can be launched together and run concurrently — they don't wait for each other to finish.

This might seem backwards at first, but once you understand this distinction, the terms "synchronous" (sequential) and "asynchronous" (parallel) will make much more sense throughout this lesson.

The Parallelization Workflow Pattern

Let's understand the high-level pattern we'll be implementing. This workflow has two distinct phases that work together to provide both speed and comprehensive results:

Phase 1: Parallel Research Gathering

Launch multiple independent OpenAI API calls simultaneously
Each call researches a different aspect of your topic (attractions, transportation, culture)
All questions run concurrently, completing in roughly the time of the slowest individual request
Results are collected and preserved in their original order

Phase 2: Sequential Result Synthesis

Combine all parallel research into a single comprehensive dataset
Send the aggregated information to GPT-5 with instructions for synthesis
Generate a unified, actionable final result (like a complete travel guide)
This sequential step ensures all information is properly integrated

This two-phase approach maximizes both efficiency and quality: you get the speed benefits of parallel processing for data gathering while maintaining coherent analysis through sequential aggregation. It's particularly powerful for research tasks, analysis workflows, and any scenario where you need to quickly gather diverse information and synthesize it into actionable insights.

Understanding Sync vs Async OpenAI Clients

When working with the OpenAI Responses API, you can choose between two client types: one for synchronous (step-by-step) operations and one for asynchronous (parallel) operations. The difference between them determines whether your program waits for each GPT-5 response before moving on or whether it can send multiple requests at once.

With the standard OpenAI client, each API call is synchronous — your code waits for a response before continuing. This is simple but can be slow if you have many independent tasks.

In contrast, the AsyncOpenAI client supports asynchronous operations. This means you can start several GPT-5 API calls at the same time, and your program will continue running while waiting for responses. This is ideal for running many independent tasks in parallel.

In summary:

Use the synchronous client for simple, sequential workflows where each step depends on the previous one.
Use the asynchronous client when you want to launch multiple independent GPT-5 API calls at once, dramatically improving performance for batch or parallel tasks.

AsyncIO Fundamentals for GPT-5 Workflows

The asyncio library provides an event loop that manages multiple operations simultaneously, switching between them efficiently rather than blocking on any single operation. The async keyword transforms a regular function into a coroutine that can be paused and resumed, while await pauses execution until an asynchronous operation completes.

This approach is particularly effective for I/O-bound operations like API calls, where much of the time is spent waiting for network responses. We keep reasoning effort minimal to further optimize response times.

Running Async Code with asyncio.run()

To execute async functions, you need an event loop. asyncio.run() creates an event loop, runs your async function, and cleans up afterward. This is the standard entry point for async programs:

This pattern of wrapping your async code in a main() function and calling it with asyncio.run() is the standard approach for async programs. The asyncio.run() function handles all the event loop management automatically, making it the simplest way to execute async code.

Concurrent Execution with asyncio.gather()

The real power of async programming comes from running multiple operations concurrently. asyncio.gather() starts multiple coroutines simultaneously and waits for all of them to complete, returning results in the original order:

The key insight: while one API call waits for GPT-5's response, the event loop can initiate or continue processing other API calls. This transforms sequential waiting time into concurrent execution time.

Creating Async Functions for GPT-5 Calls

Now that you understand the fundamentals, let's build the foundation of our parallel workflow by creating an async function specifically designed for OpenAI API calls. This function will handle individual questions while being optimized for concurrent execution.

The print statements help visualize when each question starts and completes, while returning a tuple of (question, answer) makes it easy to match responses back to their original questions when processing parallel results. The instructions parameter ensures consistent, focused responses from GPT-5.

Preparing the List of Questions

With our async function ready, let's define the independent research questions that will form the parallel component of our workflow. Parallel processing shines when you have independent problems that don't rely on each other's answers:

These questions cover different aspects of travel planning (attractions, transportation, culture) and are completely independent of each other, making them perfect candidates for parallel execution.

Building Parallel Task Collections

Now let's put asyncio.gather() to work by creating multiple tasks that execute simultaneously. This is where the parallel magic happens:

The list comprehension creates coroutine objects representing work to be done, while asyncio.gather(*tasks) starts all coroutines simultaneously and returns results in the original order regardless of completion sequence. Each result is a tuple containing the question and its corresponding answer.

Aggregating Results for Final Analysis

With all our parallel research complete, let's build the aggregation phase that synthesizes everything into a comprehensive result. This sequential step ensures all information is properly integrated:

The aggregation step continues to use minimal reasoning effort since we're simply synthesizing already-researched information into a concise guide.

Running the Complete Parallel Workflow

Let's bring it all together into a complete workflow that demonstrates the full power of parallel processing followed by intelligent aggregation:

When you run this workflow, you'll see the power of parallel execution unfold in three distinct stages:

Instant Launch: All three "🔄 Asking" messages appear immediately as the API calls fire off simultaneously
Concurrent Completion: The "✅ Answered" messages arrive as GPT-5 finishes each response — often in a different order than they were asked, proving your requests are truly running in parallel
Intelligent Synthesis: All this concurrent research gets woven together into a comprehensive travel guide that combines the speed benefits of parallel processing with thoughtful analysis

This visual progression clearly demonstrates how your requests execute concurrently rather than waiting for each other, transforming what could be a slow sequential process into a fast, efficient workflow that delivers both speed and quality.

Performance Benefits and Use Cases

This two-stage approach provides significant performance benefits while maintaining result quality. The parallel research phase completes in roughly the time of the slowest individual question, while the aggregation phase ensures all information is properly synthesized into a usable travel plan.

This pattern works well for any scenario where you need to:

Research multiple independent topics quickly
Aggregate diverse information into a unified result
Balance speed with comprehensive analysis

The performance benefits are most significant when you have many independent research topics or when individual API calls have high latency.

Summary & Practice Preparation

You've mastered parallel processing patterns that transform slow sequential workflows into lightning-fast concurrent operations. The combination of parallel research gathering and sequential result synthesis provides both speed and quality, making it ideal for complex analysis tasks like travel planning, market research, or technical evaluations.

In the upcoming exercises, you'll apply these patterns to real-world scenarios and learn to handle the nuances of concurrent GPT-5 workflows. Remember: use parallel processing for independent research tasks, then aggregate results sequentially for comprehensive final analysis.

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal