Introduction & Context

Welcome back! In the previous lesson, you successfully converted your agent system to use async programming, enabling a single agent to handle multiple conversations concurrently. However, we also identified a critical bottleneck: function execution still happens sequentially. When any conversation needs to call a function, all other conversations must wait. In this lesson, we'll remove that bottleneck by parallelizing function execution. You'll learn how to transform synchronous function calls into async operations and execute multiple functions concurrently across different conversations, dramatically improving your system's efficiency.

Understanding the Function Execution Bottleneck

Let's take a closer look at why synchronous function execution creates a bottleneck in our concurrent agent system. In the previous lesson, we made the run() method async, which allows multiple conversations to progress while waiting for API responses. However, when GPT-5 requests function calling, our agent still processes those functions one at a time.

Here's what happens with the current synchronous approach. When conversation 1 needs to call sum_numbers(), the agent executes that function and waits for it to complete. During this wait, conversation 2 might also need to call multiply_numbers(), but it has to wait until conversation 1's function finishes. Even though both conversations are running concurrently, they share the same synchronous function execution path, creating a queue where functions execute one after another.

This becomes especially problematic when functions involve external operations. Imagine you have a function that makes an API call to a weather service, taking 2 seconds to complete. If three conversations each need to call this function, they'll wait 6 seconds in total, even though all three API calls could happen simultaneously. The synchronous nature of the function calls blocks the event loop, preventing other work from happening during these waits. The solution is to make function execution async, allowing multiple functions to run concurrently, which is exactly what we'll implement next.

Running Synchronous Functions Asynchronously

Our function implementations, like sum_numbers() and multiply_numbers(), are regular synchronous Python functions. We could rewrite them all as async functions, but there's a simpler approach that works with existing synchronous code. Python's asyncio module provides a function called to_thread() that runs synchronous functions in a separate thread, preventing them from blocking the event loop.

Let's update the call_tool() method to use asyncio.to_thread(). First, we need to make the method itself async by adding the async keyword to its definition. Then, instead of directly calling the function, we'll wrap it with asyncio.to_thread() and use await:

The key change is in the line result = str(await asyncio.to_thread(tool_fn, **tool_input)). The asyncio.to_thread() function takes the synchronous function and its arguments, runs it in a separate thread, and returns an awaitable. When we use on this, the event loop can switch to other work while the function executes in its thread. This means if multiple conversations need functions at the same time, they can all execute concurrently without blocking each other. Notice that we also added the keyword to the method definition, which is required because we're using inside the method. Also note how we parse the function arguments using to convert the JSON string into a Python dictionary, and how we return the result wrapped in a structure with the for GPT-5 to match the result with the original request. Now that our function execution is async, we need to update how we call it in the method.

Collecting Function Calls for Concurrent Execution

Now that call_tool() is async, we need to update how we call it in the run() method. Instead of immediately awaiting each function call, we'll create tasks for all function calls and then execute them concurrently. This approach allows multiple functions to run at the same time, even within a single conversation turn.

Let's look at the updated logic in the run() method. When GPT-5 requests function calling, we'll iterate through all the function call requests and create tasks for the regular functions while handling handoffs separately:

The important change here is how we handle regular functions:

  • We create an empty tasks list to collect function calls for concurrent execution
  • We filter the response output to extract only the function call items
  • For each regular function (not handoffs), we first add the function call to the messages list—this is crucial because GPT-5 needs to see the function calls it made in the conversation history to understand the context when it later receives the function results
  • We then use asyncio.create_task(self.call_tool(function_call)) to schedule the function execution.
Executing All Functions in Parallel

With all our function tasks collected in the tasks list, we can now execute them concurrently using asyncio.gather(). This function takes multiple awaitables and runs them in parallel, waiting for all of them to complete before returning their results:

After collecting all the tasks, we check if the tasks list is not empty with if tasks:. This is important because if GPT-5 only requested handoffs and no regular functions, we don't want to call gather() with an empty list. When we have tasks, we use await asyncio.gather(*tasks) to execute them all concurrently. The asterisk unpacks the list, passing each task as a separate argument to gather(). The gather() function returns a list of results in the same order as the input tasks, and we use function_outputs.extend() to add all these results to our function_outputs list, which may already contain results from failed handoffs. Finally, we use messages.extend(function_outputs) to add all the function results back to the conversation. This approach means that if GPT-5 requests three function calls in a single turn, all three will execute at the same time instead of one after another, maximizing the efficiency of your system.

Complete Function Execution Flow

Let's examine the complete flow of how our agent now handles function execution, including the interaction between handoffs and regular functions. This is important because GPT-5 might request a mix of handoffs and regular function calls in the same response, and we need to handle both correctly:

When GPT-5's response includes function calls, we first filter the output to extract all function call items. We then iterate through each function request. For handoffs, we immediately await the result because they transfer control to another agent, and we need to know if the handoff succeeded before we can continue. If it succeeds, we return the result from the target agent immediately. If it fails, we add the error message to function_outputs so GPT-5 can see what went wrong. For regular functions, we first append the function call details to the messages list, then create tasks without awaiting them, allowing us to collect all the function calls that can run in parallel. Once we've processed all function requests, we check if we have any regular function tasks and use asyncio.gather() to execute them all concurrently. The final messages.extend() adds all the results back to the conversation, allowing GPT-5 to see the outcomes and continue. Now let's see this parallel execution in action.

Observing Concurrent Function Execution

When we run our updated agent system with multiple concurrent conversations, the output demonstrates how functions from different conversations execute simultaneously:

Notice how the function calls are interleaved between the two conversations. The first conversation solves the arithmetic problem (2 + 3) * (4*4), while the second finds the roots of the quadratic equation x^2 - 5x + 6 = 0. Both conversations make progress simultaneously, with their function calls executing concurrently rather than waiting for each other. Compare this to what would happen with sequential function execution, where all functions from conversation 1 would execute first, then all functions from conversation 2. The total time would be the sum of all function execution times. With parallel execution, the total time is closer to the longest single function execution time, since many functions run at the same time. The final responses show that both conversations completed successfully, maintaining the same quality of results while processing multiple conversations efficiently.

Summary & Practice Exercises

You've successfully removed the function execution bottleneck from your agent system by transforming synchronous function calls into async operations. You learned how to use asyncio.to_thread() to run blocking functions without blocking the event loop, how to collect multiple function calls into tasks with asyncio.create_task(), and how to execute them all concurrently using asyncio.gather(). The performance improvement is significant: instead of executing functions one at a time across all conversations, your agent now runs multiple functions simultaneously, which can dramatically reduce total processing time when you have many conversations or functions that involve external API calls. In the upcoming practice exercises, you'll implement concurrent function execution in different scenarios and explore how parallel execution scales with the number of conversations and functions.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal