Concurrent Agent Conversations

Introduction & Goals

Welcome to the first lesson of Parallelizing Gemini Agentic Systems in Python! In the previous courses, you built a solid foundation by creating an Agent class that can handle conversations, use tools, and even hand off control to other specialized agents using Gemini's API. Now, we're going to take your agent system to the next level by converting your agent to handle multiple concurrent conversations using async / await patterns. By the end, you'll have an agent that can juggle several API calls at once without blocking, dramatically improving the efficiency of your workflows.

Why Async Matters for Agent Systems

Let's start by understanding why we need async in the first place. When you make a regular API call to Gemini, your program stops and waits for the response. This is called "blocking" behavior. If you need to have three separate conversations with Gemini, your program handles them one at a time: start conversation 1, wait for all responses, finish conversation 1, then start conversation 2, and so on. This sequential approach wastes time. While your program waits for Gemini to respond to conversation 1, it could be starting conversation 2 or 3. Network calls and API processing take time, but your CPU sits idle during these waits. Async programming solves this problem. When you make an async API call, your program can continue doing other work while waiting for the response. Think of it like a restaurant: a synchronous waiter takes one order, goes to the kitchen, waits for the food, delivers it, and only then takes the next order. An async waiter takes multiple orders, sends them all to the kitchen, and delivers each meal as it becomes ready. The kitchen (Gemini's API) processes multiple requests in parallel, and your program efficiently manages all of them. This async approach is particularly effective for I/O-bound operations like API calls, network requests, and database queries, where most of the time is spent waiting rather than computing. Async/await improves performance for I/O-bound operations, such as API calls, network requests, or database queries. CPU-heavy tasks still block the event loop and may require multiprocessing or separate threads. For agent systems, this means a single agent can manage multiple conversations simultaneously, or you can run several agents in parallel. This becomes especially powerful when agents need to make multiple tool calls or coordinate with other agents through handoffs. Let's start by making the necessary changes to your Agent class.

Wrapping Synchronous Calls with asyncio.to_thread()

The Gemini API client (google.genai) is synchronous by design — it doesn't provide native async methods like some other APIs. However, we can still make our agent async by using Python's asyncio.to_thread() function. This function runs synchronous code in a separate thread, allowing the event loop to continue processing other tasks while waiting for the result. This pattern is often referred to as a synchronous wrapper, as it allows us to "wrap" blocking code so it can be used within an asynchronous workflow without stopping the entire program. Here's how we use it in the run method: Pythonasync def run(self, input_messages): messages = [] for m in input_messages: if isinstance(m["content"], str): messages.append({"role": m["role"], "parts": [{"text": m["content"]}]}) else: messages.append({"role": m["role"], "parts": m["content"]}) turn = 0 while turn < self.max_turns: turn += 1 response = await asyncio.to_thread( client.models.generate_content, model=self.model_name, contents=messages, config=self._build_config(), ) # ... rest of the methodasync def run(self, input_messages): messages = [] for m in input_messages: if isinstance(m["content"], str): messages.append({"role": m["role"], "parts": [{"text": m["content"]}]}) else: messages.append({"role": m["role"], "parts": m["content"]}) turn = 0 while turn < self.max_turns: turn += 1 response = await asyncio.to_thread( client.models.generate_content, model=self.model_name, contents=messages, config=self._build_config(), ) # ... rest of the method The key line is await asyncio.to_thread(client.models.generate_content, model=..., contents=messages, config=...). Instead of calling client.models.generate_content() directly, which would block the entire event loop, we wrap it with asyncio.to_thread(). This tells Python: "Run this synchronous function in a separate thread, and let me know when it's done." While that thread waits for Gemini's response, the event loop can switch to other tasks, like processing another conversation. The await keyword is crucial here. It tells Python that this operation will take time and that the event loop should be free to do other work. When the API call completes, Python automatically resumes execution at this point with the response.

Making the run Method Async

Now, let's look at the complete async run method. The method signature changes to include the async keyword, and we use await for any operations that might take time: Pythonasync def run(self, input_messages): # Changed: Added 'async' keyword messages = [] for m in input_messages: if isinstance(m["content"], str): messages.append({"role": m["role"], "parts": [{"text": m["content"]}]}) else: messages.append({"role": m["role"], "parts": m["content"]}) turn = 0 while turn < self.max_turns: turn += 1 response = await asyncio.to_thread( client.models.generate_content, model=self.model_name, contents=messages, config=self._build_config(), ) # Changed: Wrapped with asyncio.to_thread() and added 'await' # Use parts directly, not to_dict() messages.append({ "role": "model", "parts": response.candidates[0].content.parts }) tool_results = [] tasks = [] for name, args in self._iter_function_calls(response): if name == "handoff": ok, res = await self._call_handoff(args, messages) # Changed: Added 'await' if ok: return res else: tool_results.append(self._function_response_part("handoff", res)) else: tasks.append(asyncio.create_task(self._call_tool(name, args))) if tasks: tool_results.extend(await asyncio.gather(*tasks)) if tool_results: messages.append({"role": "user", "parts": tool_results}) else: return messages, self._extract_text(response) raise Exception("Max turns reached")async def run(self, input_messages): # Changed: Added 'async' keyword messages = [] for m in input_messages: if isinstance(m["content"], str): messages.append({"role": m["role"], "parts": [{"text": m["content"]}]}) else: messages.append({"role": m["role"], "parts": m["content"]}) turn = 0 while turn < self.max_turns: turn += 1 response = await asyncio.to_thread( client.models.generate_content, model=self.model_name, contents=messages, config=self._build_config(), ) # Changed: Wrapped with asyncio.to_thread() and added 'await' # Use parts directly, not to_dict() messages.append({ "role": "model", "parts": response.candidates[0].content.parts }) tool_results = [] tasks = [] for name, args in self._iter_function_calls(response): if name == "handoff": ok, res = await self._call_handoff(args, messages) # Changed: Added 'await' if ok: return res else: tool_results.append(self._function_response_part("handoff", res)) else: tasks.append(asyncio.create_task(self._call_tool(name, args))) if tasks: tool_results.extend(await asyncio.gather(*tasks)) if tool_results: messages.append({"role": "user", "parts": tool_results}) else: return messages, self._extract_text(response) raise Exception("Max turns reached") The async def keyword at the beginning tells Python this is an asynchronous function. Inside the method, we use await before asyncio.to_thread() when making the API call. This is where the magic happens — when your code hits the await keyword, it tells Python's event loop that it can switch to other work while waiting for Gemini's response. Notice that we also use await when calling self._call_handoff() because handoffs involve calling another agent's run() method, which is now also async. The method handles Gemini's parts-based response structure, extracting function calls and building appropriate response messages.

Handling Async Tool Calls

Next, we need to update our tool execution to be asynchronous. Let's modify the _call_tool method: Pythonasync def _call_tool(self, name, args): # Changed: Added 'async' keyword print(f"🔧 Tool called: {name}({args})") try: fn = self.tools[name] except KeyError: result = f"Error: Function {name} not found" print(f"❌ Error: {result}") return self._function_response_part(name, result) try: result = await asyncio.to_thread(fn, **args) # Changed: Wrapped with asyncio.to_thread() and added 'await' print(f"✅ Result: {result}") return self._function_response_part(name, str(result)) except Exception as e: result = f"Error: {str(e)}" print(f"❌ Error: {result}") return self._function_response_part(name, result)async def _call_tool(self, name, args): # Changed: Added 'async' keyword print(f"🔧 Tool called: {name}({args})") try: fn = self.tools[name] except KeyError: result = f"Error: Function {name} not found" print(f"❌ Error: {result}") return self._function_response_part(name, result) try: result = await asyncio.to_thread(fn, **args) # Changed: Wrapped with asyncio.to_thread() and added 'await' print(f"✅ Result: {result}") return self._function_response_part(name, str(result)) except Exception as e: result = f"Error: {str(e)}" print(f"❌ Error: {result}") return self._function_response_part(name, result) The key change here is using the synchronous wrapper pattern again: wrapping the tool function call with await asyncio.to_thread(fn, **args). Even though our math functions execute quickly, this pattern is important for tools that might make external API calls or perform I/O operations. By using asyncio.to_thread(), we ensure that even synchronous tools don't block the event loop. In the run method, we create tasks for multiple tool calls and use asyncio.gather() to execute them concurrently: Pythontasks = [] for name, args in self._iter_function_calls(response): if name == "handoff": ok, res = await self._call_handoff(args, messages) # ... handle handoff else: tasks.append(asyncio.create_task(self._call_tool(name, args))) if tasks: tool_results.extend(await asyncio.gather(*tasks))tasks = [] for name, args in self._iter_function_calls(response): if name == "handoff": ok, res = await self._call_handoff(args, messages) # ... handle handoff else: tasks.append(asyncio.create_task(self._call_tool(name, args))) if tasks: tool_results.extend(await asyncio.gather(*tasks)) This means if Gemini requests multiple tool calls in a single response, they can all execute concurrently rather than sequentially.

Handling Async Handoffs

When your agent hands off control to another agent, it needs to wait for that agent to complete its work. Since the other agent's run() method is now async, we need to make the _call_handoff method async as well: Pythonasync def _call_handoff(self, args, messages): # Changed: Added 'async' keyword agent_name = args.get("name") reason = args.get("reason", "No reason provided") print(f"🔄 Handoff to: {agent_name}") print(f"📝 Reason: {reason}") try: target = next(a for a in self.handoffs if a.name == agent_name) except StopIteration: result = f"Handoff failed: Agent '{agent_name}' not found. Available agents: {[agent.name for agent in self.handoffs]}" print(f"❌ {result}") return False, result # Remove the last model message that contains the handoff function call # Convert messages back to input format for target agent clean_messages = [] for msg in messages: if msg["role"] == "user": if isinstance(msg.get("parts"), list): # Extract text from parts text_parts = [] for part in msg["parts"]: if isinstance(part, dict): if "text" in part: text_parts.append(part["text"]) elif "function_response" in part: text_parts.append("Tool execution results received") elif hasattr(part, "text"): text_parts.append(part.text) if text_parts: clean_messages.append({"role": "user", "content": " ".join(text_parts)}) elif isinstance(msg.get("content"), str): clean_messages.append({"role": "user", "content": msg["content"]}) elif msg["role"] == "model" or msg["role"] == "assistant": # Extract text from model parts text_content = self._extract_text_from_parts(msg.get("parts", [])) if text_content: clean_messages.append({"role": "assistant", "content": text_content}) # Remove the last message (the handoff call) if clean_messages and clean_messages[-1]["role"] == "assistant": clean_messages = clean_messages[:-1] result = await target.run(clean_messages) # Changed: Added 'await' keyword return True, resultasync def _call_handoff(self, args, messages): # Changed: Added 'async' keyword agent_name = args.get("name") reason = args.get("reason", "No reason provided") print(f"🔄 Handoff to: {agent_name}") print(f"📝 Reason: {reason}") try: target = next(a for a in self.handoffs if a.name == agent_name) except StopIteration: result = f"Handoff failed: Agent '{agent_name}' not found. Available agents: {[agent.name for agent in self.handoffs]}" print(f"❌ {result}") return False, result # Remove the last model message that contains the handoff function call # Convert messages back to input format for target agent clean_messages = [] for msg in messages: if msg["role"] == "user": if isinstance(msg.get("parts"), list): # Extract text from parts text_parts = [] for part in msg["parts"]: if isinstance(part, dict): if "text" in part: text_parts.append(part["text"]) elif "function_response" in part: text_parts.append("Tool execution results received") elif hasattr(part, "text"): text_parts.append(part.text) if text_parts: clean_messages.append({"role": "user", "content": " ".join(text_parts)}) elif isinstance(msg.get("content"), str): clean_messages.append({"role": "user", "content": msg["content"]}) elif msg["role"] == "model" or msg["role"] == "assistant": # Extract text from model parts text_content = self._extract_text_from_parts(msg.get("parts", [])) if text_content: clean_messages.append({"role": "assistant", "content": text_content}) # Remove the last message (the handoff call) if clean_messages and clean_messages[-1]["role"] == "assistant": clean_messages = clean_messages[:-1] result = await target.run(clean_messages) # Changed: Added 'await' keyword return True, result The key change here is adding async to the method definition and using await when calling target.run(clean_messages). This allows the handoff to happen asynchronously. If the target agent needs to make multiple API calls or use tools, your original agent doesn't sit idle waiting. The event loop can switch to other work while the handoff completes. The method also handles the complexity of converting Gemini's parts-based message format back into a clean format for the target agent. It extracts text from various part types and removes the handoff function call itself before passing messages to the target agent.

Setting Up for Concurrent Execution

Before we run multiple conversations concurrently, we need to set up our imports and tools. The key addition here is importing asyncio, which provides the tools for running async code: Pythonimport asyncio import json from agent import Agent from functions import ( sum_numbers, multiply_numbers, subtract_numbers, divide_numbers, power, square_root ) with open('schemas.json', 'r') as f: tool_schemas = json.load(f) tools = { "sum_numbers": sum_numbers, "multiply_numbers": multiply_numbers, "subtract_numbers": subtract_numbers, "divide_numbers": divide_numbers, "power": power, "square_root": square_root }import asyncio import json from agent import Agent from functions import ( sum_numbers, multiply_numbers, subtract_numbers, divide_numbers, power, square_root ) with open('schemas.json', 'r') as f: tool_schemas = json.load(f) tools = { "sum_numbers": sum_numbers, "multiply_numbers": multiply_numbers, "subtract_numbers": subtract_numbers, "divide_numbers": divide_numbers, "power": power, "square_root": square_root }

Creating and Running Concurrent Tasks

With our tools ready, we can now define an async main function that creates and runs multiple conversations concurrently: Pythonasync def main(): prompts = [ "Solve this: (2 + 3) * (4*4)", "Find the roots of x^2 - 5x + 6 = 0", ] agent = Agent( name="math_assistant", system_prompt="You are a helpful math assistant.", tools=tools, tool_schemas=tool_schemas, max_turns=15 ) # Create a list of coroutines explicitly tasks = [agent.run([{"role": "user", "content": p}]) for p in prompts] # Await all tasks results = await asyncio.gather(*tasks) for idx, (_, result) in enumerate(results, start=1): print(f"\n=== run {idx} ===") print(result)async def main(): prompts = [ "Solve this: (2 + 3) * (4*4)", "Find the roots of x^2 - 5x + 6 = 0", ] agent = Agent( name="math_assistant", system_prompt="You are a helpful math assistant.", tools=tools, tool_schemas=tool_schemas, max_turns=15 ) # Create a list of coroutines explicitly tasks = [agent.run([{"role": "user", "content": p}]) for p in prompts] # Await all tasks results = await asyncio.gather(*tasks) for idx, (_, result) in enumerate(results, start=1): print(f"\n=== run {idx} ===") print(result) We start by creating a list of prompts that we want to process concurrently, then initialize our agent with the necessary tools and schemas. The key part is creating a list of tasks by calling agent.run() for each prompt. Notice that we don't use await here yet — each call returns a coroutine object (a promise of future work) but doesn't start executing immediately. Then we use asyncio.gather(*tasks) with await. The gather() function takes all our tasks and runs them concurrently. It waits for all of them to complete and returns their results in the same order as the input tasks. While one conversation waits for an API response, another conversation can make progress.

Entry Point for Async Execution

Finally, we need an entry point that creates the event loop and runs our async main function: Python if __name__ == "__main__": asyncio.run(main()) if __name__ == "__main__": asyncio.run(main()) The asyncio.run(main()) function creates an event loop, runs our main() function, and handles cleanup when everything completes. This is the standard way to start an async Python program.

Observing Concurrent Execution

When we run this code, you'll see output that demonstrates the concurrent execution: text🔧 Tool called: sum_numbers({'a': 2, 'b': 3}) 🔧 Tool called: multiply_numbers({'a': 4, 'b': 4}) 🔧 Tool called: multiply_numbers({'a': 5, 'b': 16}) 🔧 Tool called: power({'base': -5, 'exponent': 2}) 🔧 Tool called: multiply_numbers({'a': 4, 'b': 1}) 🔧 Tool called: multiply_numbers({'a': 1, 'b': 6}) 🔧 Tool called: multiply_numbers({'a': 4, 'b': 6}) 🔧 Tool called: subtract_numbers({'a': 25, 'b': 24}) 🔧 Tool called: square_root({'number': 1}) 🔧 Tool called: sum_numbers({'a': 5, 'b': 1}) 🔧 Tool called: subtract_numbers({'a': 5, 'b': 1}) 🔧 Tool called: divide_numbers({'a': 6, 'b': 2}) === run 1 === 80 === run 2 === The roots are x = 2 and x = 3.🔧 Tool called: sum_numbers({'a': 2, 'b': 3}) 🔧 Tool called: multiply_numbers({'a': 4, 'b': 4}) 🔧 Tool called: multiply_numbers({'a': 5, 'b': 16}) 🔧 Tool called: power({'base': -5, 'exponent': 2}) 🔧 Tool called: multiply_numbers({'a': 4, 'b': 1}) 🔧 Tool called: multiply_numbers({'a': 1, 'b': 6}) 🔧 Tool called: multiply_numbers({'a': 4, 'b': 6}) 🔧 Tool called: subtract_numbers({'a': 25, 'b': 24}) 🔧 Tool called: square_root({'number': 1}) 🔧 Tool called: sum_numbers({'a': 5, 'b': 1}) 🔧 Tool called: subtract_numbers({'a': 5, 'b': 1}) 🔧 Tool called: divide_numbers({'a': 6, 'b': 2}) === run 1 === 80 === run 2 === The roots are x = 2 and x = 3. Notice how the tool calls from both conversations are interleaved. This shows that both conversations are running concurrently. The agent switches between them as it waits for API responses, making efficient use of time. The first conversation solves the arithmetic problem while the second finds the roots of the quadratic equation, and both complete much faster than if they ran sequentially.

Understanding Parallelization Benefits and Remaining Bottlenecks

Now that we've seen concurrent execution in action, let's reflect on what we've achieved and what patterns we're using. By converting our agent to use async / await with asyncio.to_thread(), we've enabled a single agent instance to manage multiple conversations simultaneously. We can now process different user requests in parallel without creating multiple agent instances. In our example above, one agent handles both math problems at the same time, switching between them efficiently while waiting for API responses. What would have happened if we ran these conversations using our original synchronous code? In a standard single-threaded script, they would execute one after another, waiting for each API response before starting the next conversation. While one could manually manage a complex pool of threads or processes to achieve concurrency, asyncio provides a structured and efficient way to handle these I/O-bound tasks. By using asyncio.to_thread, we are essentially delegating the blocking API calls to separate threads automatically, allowing our program to "wait" on multiple fronts simultaneously without getting stuck. An interesting aspect of our implementation is that tool calls use await asyncio.to_thread(fn, **args), which means individual tools within a single conversation can execute concurrently when Gemini requests multiple tool calls at once. You can see this in the output — when Gemini asks for multiple calculations, they can all start processing without waiting for each other to complete. However, there's still room for optimization in how we coordinate multiple agents and handle complex workflows. In the next lesson, we'll explore patterns for orchestrating multiple specialized agents working together, allowing different agents to process different parts of a complex task simultaneously.

Summary & Exercises

You've successfully converted your agent system to use async / await patterns by wrapping synchronous Gemini API calls with asyncio.to_thread(), adding async and await to the run() and_call_handoff() methods, and using asyncio.gather() to run multiple conversations in parallel. This async foundation is crucial for building more advanced parallel agent systems, and in the upcoming practice exercises, you'll implement concurrent agent workflows and explore different patterns for coordinating multiple agents.