Going Async with Claude Agents

Introduction & Goals

Welcome to the first lesson of Parallelizing Claude Agentic Systems in Python! In the previous courses, you built a solid foundation by creating an agent class that can handle conversations, use tools, and even hand off control to other specialized agents. You've also seen how the AsyncAnthropic client works. Now we're going to take your agent system to the next level by converting your agent to handle multiple concurrent conversations using async/await patterns. By the end, you'll have an agent that can juggle several API calls at once without blocking, dramatically improving the efficiency of your workflows.

Why Async Matters for Agent Systems

Let's start by understanding why we need async in the first place. When you make a regular API call to Anthropic's Claude, your program stops and waits for the response. This is called "blocking" behavior. If you need to have three separate conversations with Claude, your program handles them one at a time: start conversation 1, wait for all responses, finish conversation 1, then start conversation 2, and so on.

This sequential approach wastes time. While your program waits for Claude to respond to conversation 1, it could be starting conversation 2 or 3. Network calls and API processing take time, but your CPU sits idle during these waits.

Async programming solves this problem. When you make an async API call, your program can continue doing other work while waiting for the response. Think of it like a restaurant: a synchronous waiter takes one order, goes to the kitchen, waits for the food, delivers it, and only then takes the next order. An async waiter takes multiple orders, sends them all to the kitchen, and delivers each meal as it becomes ready. The kitchen (Claude's API) processes multiple requests in parallel, and your program efficiently manages all of them.

This async approach is particularly effective for I/O-bound operations like API calls, network requests, and database queries, where most of the time is spent waiting rather than computing. For CPU-heavy tasks that require intense computation, the event loop can still get blocked, and you might need to consider multiprocessing or separate threads instead.

For agent systems, this means a single agent can manage multiple conversations simultaneously, or you can run several agents in parallel. This becomes especially powerful when agents need to make multiple tool calls or coordinate with other agents through handoffs. Let's start by making the necessary changes to your agent class.

Switching to AsyncAnthropic Client

The first change we'll make is in your agent's initialization. You've seen the AsyncAnthropic client before, and switching to it is straightforward. In your __init__ method, we'll simply replace the regular Anthropic() client with AsyncAnthropic():

This single-line change tells your agent to use the async version of the Anthropic client. The AsyncAnthropic client has the same interface as the regular client, but its methods return "awaitables" instead of immediate results. This means you'll need to use the await keyword when calling its methods, which we'll cover in the next section.

Making the run Method Async

Now that your agent has an async client, we need to update the run method to work asynchronously. This involves two key changes: adding the async keyword to the method definition and using await when making API calls. Here's how we'll update the run method:

The async def keyword at the beginning tells Python this is an asynchronous function. Inside the method, we use await before self.client.messages.create(). This is where the magic happens. When your code hits the await keyword, it tells Python: "This operation will take some time. While we're waiting for the API response, feel free to do other work". Behind the scenes, Python's event loop can now switch to other tasks. If you have multiple agent conversations running, the event loop can start processing another conversation while this one waits for Claude's response. Notice that we also use await when calling because handoffs involve calling another agent's method, which is now also async.

Handling Async Handoffs

When your agent hands off control to another agent, it needs to wait for that agent to complete its work. Since the other agent's run() method is now async, we'll also need to make the call_handoff method async:

The key change here is adding async to the method definition and using await when calling target_agent.run(clean_messages). This allows the handoff to happen asynchronously. If the target agent needs to make multiple API calls or use tools, your original agent doesn't sit idle waiting. The event loop can switch to other work while the handoff completes. Now that your agent is fully async, let's see how we can run multiple conversations in parallel.

Setting Up for Concurrent Execution

Before we run multiple conversations concurrently, we need to set up our imports and tools. The key addition here is importing asyncio, which provides the tools for running async code:

Creating and Running Concurrent Tasks

With our tools ready, we can now define an async main function that creates and runs multiple conversations concurrently:

We start by creating a list of prompts that we want to process concurrently, then initialize our agent with the necessary tools and schemas. The key part is creating a list of tasks by calling agent.run() for each prompt. Notice that we don't use await here yet - each call returns a coroutine object (a promise of future work) but doesn't start executing immediately. Then we use asyncio.gather(*tasks) with await. The gather() function takes all our tasks and runs them concurrently. It waits for all of them to complete and returns their results in the same order as the input tasks. While one conversation waits for an API response, another conversation can make progress.

Entry Point for Async Execution

Finally, we need an entry point that creates the event loop and runs our async main function:

The asyncio.run(main()) function creates an event loop, runs our main() function, and handles cleanup when everything completes. This is the standard way to start an async Python program.

Observing Concurrent Execution

When we run this code, you'll see output that demonstrates the concurrent execution:

Notice how the tool calls from both conversations are interleaved. This shows that both conversations are running concurrently. The agent switches between them as it waits for API responses, making efficient use of time. The first conversation solves the arithmetic problem while the second finds the roots of the quadratic equation, and both complete much faster than if they ran sequentially.

Understanding Parallelization Benefits and Remaining Bottlenecks

Now that we've seen concurrent execution in action, let's reflect on what we've achieved and what still needs improvement. By converting our agent to use async/await, we've enabled a single agent instance to manage multiple conversations simultaneously. We can now process different user requests in parallel without creating multiple agent instances. In our example above, one agent handles both math problems at the same time, switching between them efficiently while waiting for API responses.

But what would have happened if we tried to run multiple conversations with our original, synchronous agent? Even if we created separate threads or processes, the conversations would effectively execute sequentially. The synchronous Anthropic client blocks while waiting for responses, wasting time that could be spent processing other conversations.

However, there's an important limitation in our current implementation. Notice in the output how tool calls execute one at a time, even when they come from different conversations:

These tool calls execute sequentially because our call_tool() method is synchronous. When one conversation needs to run a tool, all other conversations must wait. This becomes a bottleneck when we have many tool calls or when tools involve external API calls. In the next lesson, we'll parallelize tool execution, allowing our agent to make multiple tool calls simultaneously across different conversations.

Summary & Exercises

You've successfully converted your agent system to use async/await patterns by switching to AsyncAnthropic(), adding async and await to the run() and call_handoff() methods, and using asyncio.gather() to run multiple conversations in parallel. This async foundation is crucial for building more advanced parallel agent systems, and in the upcoming practice exercises, you'll implement concurrent agent workflows and explore different patterns for coordinating multiple agents.

Next Lesson: Implementing Async Tool Execution

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal