Welcome back! You've mastered sequential workflows with prompt chaining and conditional workflows with intelligent routing. Now it's time to unlock dramatic performance improvements by learning parallel processing — executing multiple independent OpenAI API calls simultaneously instead of waiting for each one to complete.
In this lesson, you'll discover how to transform workflows that take minutes into operations that complete in seconds. You'll learn the difference between synchronous and asynchronous programming, master the asyncio library, and build a system that asks multiple questions to GPT-5 at the same time.
Before we dive into the technical details, let's clarify terms that often confuse beginners: synchronous and asynchronous.
Outside of programming, "synchronous" means "happening at the same time" — think synchronized swimming or clocks ticking together. In programming, though, "synchronous" means the opposite: operations are coordinated in sequence, not in parallel.
When we say "synchronous API calls," we mean calls that happen one after another, waiting for each to complete before starting the next. "Asynchronous" API calls, on the other hand, can be launched together and run concurrently — they don't wait for each other to finish.
This might seem backwards at first, but once you understand this distinction, the terms "synchronous" (sequential) and "asynchronous" (parallel) will make much more sense throughout this lesson.
Let's understand the high-level pattern we'll be implementing. This workflow has two distinct phases that work together to provide both speed and comprehensive results:
Phase 1: Parallel Research Gathering
- Launch multiple independent
OpenAIAPI calls simultaneously - Each call researches a different aspect of your topic (attractions, transportation, culture)
- All questions run concurrently, completing in roughly the time of the slowest individual request
- Results are collected and preserved in their original order
Phase 2: Sequential Result Synthesis
- Combine all parallel research into a single comprehensive dataset
- Send the aggregated information to
GPT-5with instructions for synthesis - Generate a unified, actionable final result (like a complete travel guide)
- This sequential step ensures all information is properly integrated
This two-phase approach maximizes both efficiency and quality: you get the speed benefits of parallel processing for data gathering while maintaining coherent analysis through sequential aggregation. It's particularly powerful for research tasks, analysis workflows, and any scenario where you need to quickly gather diverse information and synthesize it into actionable insights.
When working with the OpenAI Responses API, you can choose between two client types: one for synchronous (step-by-step) operations and one for asynchronous (parallel) operations. The difference between them determines whether your program waits for each GPT-5 response before moving on or whether it can send multiple requests at once.
With the standard OpenAI client, each API call is synchronous — your code waits for a response before continuing. This is simple but can be slow if you have many independent tasks.
In contrast, the AsyncOpenAI client supports asynchronous operations. This means you can start several GPT-5 API calls at the same time, and your program will continue running while waiting for responses. This is ideal for running many independent tasks in parallel.
In summary:
- Use the synchronous client for simple, sequential workflows where each step depends on the previous one.
- Use the asynchronous client when you want to launch multiple independent
GPT-5API calls at once, dramatically improving performance for batch or parallel tasks.
The asyncio library provides an event loop that manages multiple operations simultaneously, switching between them efficiently rather than blocking on any single operation. The async keyword transforms a regular function into a coroutine that can be paused and resumed, while await pauses execution until an asynchronous operation completes.
This approach is particularly effective for I/O-bound operations like API calls, where much of the time is spent waiting for network responses. We keep reasoning effort minimal to further optimize response times.
To execute async functions, you need an event loop. asyncio.run() creates an event loop, runs your async function, and cleans up afterward. This is the standard entry point for async programs:
This pattern of wrapping your async code in a main() function and calling it with asyncio.run() is the standard approach for async programs. The asyncio.run() function handles all the event loop management automatically, making it the simplest way to execute async code.
The real power of async programming comes from running multiple operations concurrently. asyncio.gather() starts multiple coroutines simultaneously and waits for all of them to complete, returning results in the original order:
The key insight: while one API call waits for GPT-5's response, the event loop can initiate or continue processing other API calls. This transforms sequential waiting time into concurrent execution time.
Now that you understand the fundamentals, let's build the foundation of our parallel workflow by creating an async function specifically designed for OpenAI API calls. This function will handle individual questions while being optimized for concurrent execution.
The print statements help visualize when each question starts and completes, while returning a tuple of (question, answer) makes it easy to match responses back to their original questions when processing parallel results. The instructions parameter ensures consistent, focused responses from GPT-5.
With our async function ready, let's define the independent research questions that will form the parallel component of our workflow. Parallel processing shines when you have independent problems that don't rely on each other's answers:
These questions cover different aspects of travel planning (attractions, transportation, culture) and are completely independent of each other, making them perfect candidates for parallel execution.
Now let's put asyncio.gather() to work by creating multiple tasks that execute simultaneously. This is where the parallel magic happens:
The list comprehension creates coroutine objects representing work to be done, while asyncio.gather(*tasks) starts all coroutines simultaneously and returns results in the original order regardless of completion sequence. Each result is a tuple containing the question and its corresponding answer.
With all our parallel research complete, let's build the aggregation phase that synthesizes everything into a comprehensive result. This sequential step ensures all information is properly integrated:
The aggregation step continues to use minimal reasoning effort since we're simply synthesizing already-researched information into a concise guide.
Let's bring it all together into a complete workflow that demonstrates the full power of parallel processing followed by intelligent aggregation:
When you run this workflow, you'll see the power of parallel execution unfold in three distinct stages:
- Instant Launch: All three "🔄 Asking" messages appear immediately as the API calls fire off simultaneously
- Concurrent Completion: The "✅ Answered" messages arrive as
GPT-5finishes each response — often in a different order than they were asked, proving your requests are truly running in parallel - Intelligent Synthesis: All this concurrent research gets woven together into a comprehensive travel guide that combines the speed benefits of parallel processing with thoughtful analysis
This visual progression clearly demonstrates how your requests execute concurrently rather than waiting for each other, transforming what could be a slow sequential process into a fast, efficient workflow that delivers both speed and quality.
This two-stage approach provides significant performance benefits while maintaining result quality. The parallel research phase completes in roughly the time of the slowest individual question, while the aggregation phase ensures all information is properly synthesized into a usable travel plan.
This pattern works well for any scenario where you need to:
- Research multiple independent topics quickly
- Aggregate diverse information into a unified result
- Balance speed with comprehensive analysis
The performance benefits are most significant when you have many independent research topics or when individual API calls have high latency.
You've mastered parallel processing patterns that transform slow sequential workflows into lightning-fast concurrent operations. The combination of parallel research gathering and sequential result synthesis provides both speed and quality, making it ideal for complex analysis tasks like travel planning, market research, or technical evaluations.
In the upcoming exercises, you'll apply these patterns to real-world scenarios and learn to handle the nuances of concurrent GPT-5 workflows. Remember: use parallel processing for independent research tasks, then aggregate results sequentially for comprehensive final analysis.
