Parallelizing Claude Agentic Systems in Ruby

Introduction & Goals

Welcome to the first lesson of Parallelizing Claude Agentic Systems in Ruby! In this lesson, you'll learn how to use Ruby's threading capabilities to run multiple concurrent conversations with Claude. You've already seen the Agent class that can handle conversations, use tools, and hand off control to other specialized agents.

Now we're going to make your agent system handle multiple conversations simultaneously using Ruby threads. By the end, you'll have an agent that can process several independent requests at once without waiting for each to complete sequentially, dramatically improving the efficiency of your workflows.

Why Concurrent Execution Matters for Agent Systems

Let's start by understanding why we need concurrency in the first place. When you make a regular API call to Anthropic's Claude, your program stops and waits for the response. This is called "blocking" behavior. If you need to have three separate conversations with Claude, your program handles them one at a time: start conversation 1, wait for all responses, finish conversation 1, then start conversation 2, and so on.

This sequential approach wastes time. While your program waits for Claude to respond to conversation 1, it could be starting conversation 2 or 3. Network calls and API processing take time, but your CPU sits idle during these waits.

Concurrent execution using Ruby threads solves this problem. When you create a thread for an API call, your program can continue creating more threads and starting other conversations while waiting for responses. Think of it like a restaurant: a sequential waiter takes one order, goes to the kitchen, waits for the food, delivers it, and only then takes the next order. A concurrent approach is like having multiple orders in flight — while one meal is being prepared, other orders are being taken and other meals are being delivered.

This concurrent approach is particularly effective for I/O-bound operations like API calls, network requests, and database queries, where most of the time is spent waiting rather than computing. For CPU-heavy tasks that require intense computation, threads won't provide the same benefits since Ruby threads (depending on your Ruby implementation) may not run truly in parallel for CPU-bound work. However, for network like API calls to , threads allow you to overlap the waiting time.

Understanding the Agent Class Structure

Before we dive into concurrent execution, let's understand the key parts of the Agent class that make concurrent conversations possible. The agent uses the standard Anthropic::Client from the anthropic gem:

The run method processes a conversation synchronously, making API calls and handling tool use in a loop:

This run method is synchronous — it blocks while waiting for each API response. However, by wrapping multiple run calls in separate threads, we can have multiple conversations in flight simultaneously.

Setting Up Concurrent Execution

Now let's look at how to set up multiple concurrent conversations. The provided main.rb shows the pattern. First, we define multiple prompts that we want to process:

Next, we create a single agent instance that will handle all conversations:

This agent doesn't have any tools or tool schemas configured in this example — it will rely purely on Claude's built-in mathematical reasoning capabilities. If you wanted to provide specific tools (like calculator functions), you would pass them via the tools: and tool_schemas: parameters as supported by the Agent class.

Creating and Running Concurrent Threads

With our agent and prompts ready, we can now create threads to run multiple conversations concurrently:

This code creates one thread for each prompt using Thread.new. Inside each thread's block, we call agent.run() with a single user message. The map operation returns an array of thread objects, but importantly, the threads have already started executing at this point.

The key insight here is that each thread runs its own conversation independently. While one thread waits for Claude's response to the compound interest question, another thread can be waiting for the derivative question, and a third can be waiting for the equation solving question. Ruby's thread scheduler handles switching between threads efficiently.

To collect the results, we wait for all threads to complete:

The Thread#value method blocks until the thread completes and returns the value returned by the thread's block (in this case, the result of agent.run()). By calling threads.map(&:value), we wait for all threads to finish and collect all their results in order. This is the synchronization point where the main thread waits for all conversations to complete.

Entry Point and Execution Model

Running this concurrent agent system is straightforward — it's just a regular Ruby script:

When Ruby executes this script, it:

Loads the required classes and creates the agent
Creates and immediately starts all threads (in prompts.map { Thread.new { ... } })
Continues to the threads.map(&:value) line, which blocks waiting for all threads
Once all threads complete, displays the results and exits

The concurrency happens automatically between steps 2 and 3. While the main thread is blocked at threads.map(&:value), Ruby's thread scheduler switches between the worker threads, allowing them to make progress on their API calls.

This is different from a purely sequential approach where you might have:

In the sequential version, each conversation would complete fully before the next one starts, wasting time during network I/O waits.

Observing Concurrent Execution

When you run the concurrent version, you'll see output like this:

The key observation is that all three conversations happen concurrently, even though the output is displayed sequentially (because we wait for all threads to complete before displaying results). If we added timing information, you'd see that all three conversations complete in roughly the time it would take for just one conversation sequentially, since the wait times overlap.

If the agent were configured with tools, you might see interleaved tool call logs:

The output would show tool calls from different conversations intermixed:

This interleaving demonstrates that the conversations are truly running concurrently — tool calls from different threads are being processed as each thread makes progress.

Understanding Parallelization Benefits and Remaining Bottlenecks

Now that we've seen concurrent execution in action, let's reflect on what we've achieved and what limitations remain. By using Ruby threads to wrap multiple agent.run() calls, we've enabled parallel processing of independent conversations. When one conversation is blocked waiting for an API response, other conversations can make progress. This overlapping of I/O wait time is the primary benefit we gain.

For our example with three math problems, all three conversations happen simultaneously. Instead of taking 3× the time of a single conversation, they complete in roughly the time of the slowest conversation, since the wait times overlap.

However, there are important limitations to understand:

Within a Single Conversation: Inside one call to agent.run(), operations are sequential. When Claude requests multiple tool uses in a single response, the agent processes them one at a time:

If Claude requests five tool calls in one turn, they execute sequentially within that conversation, even though other conversations are running in parallel threads.

Handoff Blocking: If an agent transfers control to another agent via a handoff, the current thread blocks completely while the target agent runs its conversation.

Thread Safety Considerations: In our example, we're sharing a single Agent instance and its @client across multiple threads. The thread safety of this approach depends on whether the underlying gem's client implementation is thread-safe. The may use connection pooling or may require separate client instances per thread. For production use, you might want to create separate agent instances per thread or verify the thread-safety guarantees of the underlying client.

Summary & Exercises

You've successfully learned how to use Ruby threads to run multiple concurrent conversations with Claude. The key pattern is:

Create multiple threads using Thread.new { agent.run(...) }
Each thread runs an independent conversation
Collect results with threads.map(&:value) to wait for completion

This threading approach allows you to overlap I/O wait time across multiple conversations, dramatically improving throughput when processing multiple independent requests.

In the upcoming practice, you will apply these concepts by building a system that processes a collection of diverse prompts in parallel.

Next Lesson: Implementing Concurrent Tool Execution

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal