Welcome to the first lesson of Parallelizing Claude Agentic Systems in Ruby! In this lesson, you'll learn how to use Ruby's threading capabilities to run multiple concurrent conversations with Claude. You've already seen the Agent class that can handle conversations, use tools, and hand off control to other specialized agents.
Now we're going to make your agent system handle multiple conversations simultaneously using Ruby threads. By the end, you'll have an agent that can process several independent requests at once without waiting for each to complete sequentially, dramatically improving the efficiency of your workflows.
Let's start by understanding why we need concurrency in the first place. When you make a regular API call to Anthropic's Claude, your program stops and waits for the response. This is called "blocking" behavior. If you need to have three separate conversations with Claude, your program handles them one at a time: start conversation 1, wait for all responses, finish conversation 1, then start conversation 2, and so on.
This sequential approach wastes time. While your program waits for Claude to respond to conversation 1, it could be starting conversation 2 or 3. Network calls and API processing take time, but your CPU sits idle during these waits.
Concurrent execution using Ruby threads solves this problem. When you create a thread for an API call, your program can continue creating more threads and starting other conversations while waiting for responses. Think of it like a restaurant: a sequential waiter takes one order, goes to the kitchen, waits for the food, delivers it, and only then takes the next order. A concurrent approach is like having multiple orders in flight — while one meal is being prepared, other orders are being taken and other meals are being delivered.
This concurrent approach is particularly effective for I/O-bound operations like API calls, network requests, and database queries, where most of the time is spent waiting rather than computing. For CPU-heavy tasks that require intense computation, threads won't provide the same benefits since Ruby threads (depending on your Ruby implementation) may not run truly in parallel for CPU-bound work. However, for network like API calls to , threads allow you to overlap the waiting time.
Before we dive into concurrent execution, let's understand the key parts of the Agent class that make concurrent conversations possible. The agent uses the standard Anthropic::Client from the anthropic gem:
The run method processes a conversation synchronously, making API calls and handling tool use in a loop:
This run method is synchronous — it blocks while waiting for each API response. However, by wrapping multiple run calls in separate threads, we can have multiple conversations in flight simultaneously.
Now let's look at how to set up multiple concurrent conversations. The provided main.rb shows the pattern. First, we define multiple prompts that we want to process:
Next, we create a single agent instance that will handle all conversations:
This agent doesn't have any tools or tool schemas configured in this example — it will rely purely on Claude's built-in mathematical reasoning capabilities. If you wanted to provide specific tools (like calculator functions), you would pass them via the tools: and tool_schemas: parameters as supported by the Agent class.
With our agent and prompts ready, we can now create threads to run multiple conversations concurrently:
This code creates one thread for each prompt using Thread.new. Inside each thread's block, we call agent.run() with a single user message. The map operation returns an array of thread objects, but importantly, the threads have already started executing at this point.
The key insight here is that each thread runs its own conversation independently. While one thread waits for Claude's response to the compound interest question, another thread can be waiting for the derivative question, and a third can be waiting for the equation solving question. Ruby's thread scheduler handles switching between threads efficiently.
To collect the results, we wait for all threads to complete:
The Thread#value method blocks until the thread completes and returns the value returned by the thread's block (in this case, the result of agent.run()). By calling threads.map(&:value), we wait for all threads to finish and collect all their results in order. This is the synchronization point where the main thread waits for all conversations to complete.
Running this concurrent agent system is straightforward — it's just a regular Ruby script:
When Ruby executes this script, it:
- Loads the required classes and creates the
agent - Creates and immediately starts all
threads(inprompts.map { Thread.new { ... } }) - Continues to the
threads.map(&:value)line, which blocks waiting for all threads - Once all threads complete, displays the results and exits
The concurrency happens automatically between steps 2 and 3. While the main thread is blocked at threads.map(&:value), Ruby's thread scheduler switches between the worker threads, allowing them to make progress on their API calls.
This is different from a purely sequential approach where you might have:
In the sequential version, each conversation would complete fully before the next one starts, wasting time during network I/O waits.
When you run the concurrent version, you'll see output like this:
The key observation is that all three conversations happen concurrently, even though the output is displayed sequentially (because we wait for all threads to complete before displaying results). If we added timing information, you'd see that all three conversations complete in roughly the time it would take for just one conversation sequentially, since the wait times overlap.
If the agent were configured with tools, you might see interleaved tool call logs:
The output would show tool calls from different conversations intermixed:
This interleaving demonstrates that the conversations are truly running concurrently — tool calls from different threads are being processed as each thread makes progress.
Now that we've seen concurrent execution in action, let's reflect on what we've achieved and what limitations remain. By using Ruby threads to wrap multiple agent.run() calls, we've enabled parallel processing of independent conversations. When one conversation is blocked waiting for an API response, other conversations can make progress. This overlapping of I/O wait time is the primary benefit we gain.
For our example with three math problems, all three conversations happen simultaneously. Instead of taking 3× the time of a single conversation, they complete in roughly the time of the slowest conversation, since the wait times overlap.
However, there are important limitations to understand:
Within a Single Conversation: Inside one call to agent.run(), operations are sequential. When Claude requests multiple tool uses in a single response, the agent processes them one at a time:
If Claude requests five tool calls in one turn, they execute sequentially within that conversation, even though other conversations are running in parallel threads.
Handoff Blocking: If an agent transfers control to another agent via a handoff, the current thread blocks completely while the target agent runs its conversation.
Thread Safety Considerations: In our example, we're sharing a single Agent instance and its @client across multiple threads. The thread safety of this approach depends on whether the underlying gem's client implementation is thread-safe. The may use connection pooling or may require separate client instances per thread. For production use, you might want to create separate agent instances per thread or verify the thread-safety guarantees of the underlying client.
You've successfully learned how to use Ruby threads to run multiple concurrent conversations with Claude. The key pattern is:
- Create multiple threads using
Thread.new { agent.run(...) } - Each thread runs an independent conversation
- Collect results with
threads.map(&:value)to wait for completion
This threading approach allows you to overlap I/O wait time across multiple conversations, dramatically improving throughput when processing multiple independent requests.
In the upcoming practice, you will apply these concepts by building a system that processes a collection of diverse prompts in parallel.
