Implementing Concurrent Tool Execution

Introduction & Context

Welcome back! In the previous lesson, you successfully enabled your agent system to handle multiple concurrent conversations using Ruby threads. Each conversation runs in its own thread, allowing your system to manage many users simultaneously. However, there is still a critical bottleneck within each individual conversation: when Claude requests multiple tools in a single turn, those tools execute one after another.

In this lesson, we will remove that bottleneck by parallelizing tool execution within a single agent turn. You will learn how to use Ruby threads to execute multiple tool calls concurrently, dramatically improving your system's efficiency when Claude needs to perform several calculations or operations at once.

Understanding the Tool Execution Bottleneck

Let's examine why sequential tool execution creates a bottleneck within a single agent turn. Currently, when Claude requests multiple tools in one response, our agent processes them one at a time. This sequential approach works, but it is inefficient.

Here is what happens with the current sequential approach. When Claude asks to find the square roots of three different numbers, it might request three separate square_root tool calls in a single response. With sequential execution, the agent calls the first square_root function, waits for it to complete, then calls the second, waits again, and finally calls the third.

If each calculation takes 100 milliseconds, the total time is 300 milliseconds — even though these three calculations are completely independent and could happen simultaneously.

This becomes especially problematic when tools involve external operations. Imagine you have a tool that makes an HTTP request to a weather API, taking 2 seconds to complete. If Claude requests weather data for three different cities in one turn, sequential execution would take 6 seconds total. But since these are three independent network requests, they could all happen at the same time, reducing the total wait to just 2 seconds. The solution is to execute independent tool calls concurrently using Ruby threads, allowing multiple tools to run in parallel during a single agent turn. Let's see how to implement this.

Spawning Threads for Tool Execution

Our tool functions, like sum_numbers and multiply_numbers, are regular synchronous Ruby methods. We do not need to change them at all. Instead, we will change how the agent calls these tools. Rather than executing each tool sequentially, we will spawn a separate Ruby thread for each tool call and let them all run concurrently.

Let's look at how this works in the run method. When Claude's response includes tool usage, we iterate through all the tool use requests and separate them into regular tools and handoffs:

The key insight here is that we are collecting all the regular tool calls into the tool_uses array before executing any of them. This allows us to process them all at once instead of one at a time. We handle handoffs separately because they transfer control to another agent — we need to know if a handoff succeeds before continuing. Regular tools, however, are independent operations that can run concurrently. Once we have separated the tool uses, we can execute them in parallel using threads.

Executing Tools Concurrently with Threads

With all regular tool calls collected in the tool_uses array, we can now execute them concurrently using Ruby threads. We will create one thread for each tool call and then wait for all threads to complete:

This is where the magic happens. For each tool use object in tool_uses, we create a new thread with Thread.new { call_tool(tu) }. Each thread immediately starts executing its call_tool method in parallel with all the others. The map operation returns an array of thread objects, which we store in tool_threads.

Then we use tool_threads.map(&:value) to wait for all threads to complete and collect their results. The value method on a thread blocks until that finishes executing and returns the result. By mapping over all , we wait for every tool to complete and gather all the results into the array, preserving the original order.

Handling Handoffs with Concurrent Tools

After executing all regular tools concurrently, we need to handle any handoff requests. handoffs transfer control to another agent, so they require special treatment:

If Claude requested a handoff (stored in handoff_use), we call call_handoff with the handoff request and the current conversation messages. This method attempts to transfer control to another specialized agent. It returns two values: a boolean success indicating whether the handoff worked, and the handoff_result containing either the response from the target agent or an error message.

If the handoff succeeds (success is true), we immediately return the result from the target agent, ending this agent's involvement in the conversation. If the handoff fails — for example, if the target agent doesn't exist — we add the error message to so can see what went wrong and adjust its approach.

Complete Tool Execution Flow

Let's examine the complete flow of how our agent handles tool execution, including the interaction between concurrent tool calls and handoffs:

When Claude's response includes tool usage, we first iterate through all the tool requests and separate them into two categories: regular tools go into tool_uses, and any handoff request is stored in handoff_use. This separation is crucial because regular tools can run in parallel, but handoffs transfer control.

For all regular tools, we create a thread for each one with Thread.new { call_tool(tu) }. These threads start executing immediately and run concurrently. We then collect all the thread objects and call map(&:value) to wait for every thread to finish and gather their results. This is a — we will not proceed until all tools have completed — but the tools themselves run in parallel, so the total wait time is minimized.

Observing Concurrent Tool Execution

When we run our agent with a request that triggers multiple tool calls, the output demonstrates how tools execute simultaneously:

Notice how all three square_root tool calls appear in rapid succession. This happens because they are executing in separate threads concurrently. The log lines may even interleave or appear in slightly different orders on different runs, depending on thread scheduling. The key observation is that all three tool calls start essentially at the same time — rather than waiting for each to complete before starting the next.

The final response shows that Claude successfully processed all three results and presented them in a clean, organized format. The agent maintained the same quality of results while executing the tools much faster than sequential execution would have allowed. If each square_root calculation took 100 milliseconds, sequential execution would require 300 milliseconds total, but concurrent execution completes in just over 100 milliseconds — the time of the slowest single operation.

This performance improvement becomes even more dramatic with tools that involve network requests or database queries. Imagine a tool that fetches data from an , taking per call. If requests this tool three times in one turn, sequential execution would take , but concurrent execution would take only . The efficiency gains scale with the number of independent tools requested in a single turn.

Summary & Practice Exercises

You have successfully parallelized tool execution within your Ruby agent system by using threads to run multiple tool calls concurrently. You learned how to separate tool uses from handoffs, spawn a thread for each independent tool call using Thread.new, and synchronize all threads using map(&:value) to collect results.

The performance improvement is significant: instead of executing tools one at a time, your agent now runs multiple tools simultaneously, reducing the total time to roughly the duration of the slowest single tool.

To practice these concepts, try the following exercises:

Multiple Tool Types: Modify main.rb to ask a question that requires different tool types in one turn, such as "Calculate (5 + 3) and then find the square root of 64." Observe how both sum_numbers and square_root execute concurrently in the logs.
Error Handling: Request a calculation that will cause an error, like "Divide 10 by 0 and also calculate times ." Verify that the error in (division by zero) does not prevent from completing successfully, and that receives both results.

Previous Lesson

Next Lesson: Orchestrating Parallel Agent Systems

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal