Welcome back! In the previous lesson, you successfully converted your agent system to use async programming, enabling a single agent to handle multiple conversations concurrently. However, we also identified a critical bottleneck: tool execution still happens sequentially. When any conversation needs to call a tool, all other conversations must wait. In this lesson, we'll remove that bottleneck by parallelizing tool execution. You'll learn how to transform synchronous tool functions into async operations and execute multiple tools concurrently across different conversations, dramatically improving your system's efficiency.
Let's take a closer look at why synchronous tool execution creates a bottleneck in our concurrent agent system. In the previous lesson, we made the run() method async, which allows multiple conversations to progress while waiting for API responses. However, when Claude requests tool usage, our agent still processes those tools one at a time.
Here's what happens with the current synchronous approach. When conversation 1 needs to call sum_numbers(), the agent executes that function and waits for it to complete. During this wait, conversation 2 might also need to call multiply_numbers(), but it has to wait until conversation 1's tool finishes. Even though both conversations are running concurrently, they share the same synchronous tool execution path, creating a queue where tools execute one after another.
This becomes especially problematic when tools involve external operations. Imagine you have a tool that makes an API call to a weather service, taking 2 seconds to complete. If three conversations each need to call this tool, they'll wait 6 seconds in total, even though all three API calls could happen simultaneously. The synchronous nature of the tool functions blocks the event loop, preventing other work from happening during these waits. The solution is to make tool execution async, allowing multiple tools to run concurrently, which is exactly what we'll implement next.
Our tool functions, like sum_numbers() and multiply_numbers(), are regular synchronous Python functions. We could rewrite them all as async functions, but there's a simpler approach that works with existing synchronous code. Python's asyncio module provides a function called to_thread() that runs synchronous functions in a separate thread, preventing them from blocking the event loop.
Let's update the call_tool() method to use asyncio.to_thread(). First, we need to make the method itself async by adding the async keyword to its definition. Then, instead of directly calling the tool function, we'll wrap it with asyncio.to_thread() and use await:
The key change is in the line result = str(await asyncio.to_thread(tool_fn, **tool_input)). The asyncio.to_thread() function takes the synchronous tool function and its arguments, runs it in a separate thread, and returns an awaitable. When we use on this, the event loop can switch to other work while the tool executes in its thread. This means if multiple conversations need tools at the same time, they can all execute concurrently without blocking each other. Notice that we also added the keyword to the method definition, which is required because we're using inside the method. Now that our tool execution is async, we need to update how we call it in the method.
Now that call_tool() is async, we need to update how we call it in the run() method. Instead of immediately awaiting each tool call, we'll create tasks for all tool calls and then execute them concurrently. This approach allows multiple tools to run at the same time, even within a single conversation turn.
Let's look at the updated logic in the run() method. When Claude requests tool usage, we'll iterate through all the tool use requests and create tasks for the regular tools while handling handoffs separately:
The important change here is how we handle regular tools. Instead of using await self.call_tool(content_item) directly, we use asyncio.create_task(self.call_tool(content_item)). The create_task() function schedules the coroutine to run on the event loop but doesn't wait for it to complete. It returns a task object that we can await later, and we collect all these task objects in the tasks list. Notice that we still handle handoffs immediately with await self.call_handoff() because handoffs transfer control to another agent, and we need to know the result before continuing. By creating tasks for all regular tools before awaiting any of them, we set up the foundation for concurrent execution, which we'll implement in the next section.
With all our tool tasks collected in the tasks list, we can now execute them concurrently using asyncio.gather(). This function takes multiple awaitables and runs them in parallel, waiting for all of them to complete before returning their results:
After collecting all the tasks, we check if the tasks list is not empty with if tasks:. This is important because if Claude only requested handoffs and no regular tools, we don't want to call gather() with an empty list. When we have tasks, we use await asyncio.gather(*tasks) to execute them all concurrently. The asterisk unpacks the list, passing each task as a separate argument to gather(). The gather() function returns a list of results in the same order as the input tasks, and we use tool_results.extend() to add all these results to our tool_results list, which may already contain results from failed handoffs. This approach means that if Claude requests three tool calls in a single turn, all three will execute at the same time instead of one after another, maximizing the efficiency of your system.
Let's examine the complete flow of how our agent now handles tool execution, including the interaction between handoffs and regular tools. This is important because Claude might request a mix of handoffs and regular tool calls in the same response, and we need to handle both correctly:
When Claude's response includes tool usage, we iterate through each tool request. For handoffs, we immediately await the result because they transfer control to another agent, and we need to know if the handoff succeeded before we can continue. If it succeeds, we return the result from the target agent immediately. If it fails, we add the error message to tool_results so Claude can see what went wrong. For regular tools, we create tasks without awaiting them, allowing us to collect all the tool calls that can run in parallel. Once we've processed all tool requests, we check if we have any regular tool tasks and use asyncio.gather() to execute them all concurrently. The final messages.append() adds all the results back to the conversation, allowing Claude to see the outcomes and continue. Now let's see this parallel execution in action.
When we run our updated agent system with multiple concurrent conversations, the output demonstrates how tools from different conversations execute simultaneously:
Notice how the tool calls are interleaved between the two conversations. The first conversation solves the arithmetic problem (2 + 3) * (4*4), while the second finds the roots of the quadratic equation x^2 - 5x + 6 = 0. Both conversations make progress simultaneously, with their tool calls executing concurrently rather than waiting for each other. Compare this to what would happen with sequential tool execution, where all tools from conversation 1 would execute first, then all tools from conversation 2. The total time would be the sum of all tool execution times. With parallel execution, the total time is closer to the longest single tool execution time, since many tools run at the same time. The final responses show that both conversations completed successfully, maintaining the same quality of results while processing multiple conversations efficiently.
You've successfully removed the tool execution bottleneck from your agent system by transforming synchronous tool functions into async operations. You learned how to use asyncio.to_thread() to run blocking functions without blocking the event loop, how to collect multiple tool calls into tasks with asyncio.create_task(), and how to execute them all concurrently using asyncio.gather(). The performance improvement is significant: instead of executing tools one at a time across all conversations, your agent now runs multiple tools simultaneously, which can dramatically reduce total processing time when you have many conversations or tools that involve external API calls. In the upcoming practice exercises, you'll implement concurrent tool execution in different scenarios and explore how parallel execution scales with the number of conversations and tools.
