Building Autonomous Gemini Agents

Introduction & Overview

Throughout this course, you've learned the fundamentals of integrating tools with Gemini : defining tool schemas, understanding Gemini's tool use responses, and executing single tool requests. However, the approach you've used so far is limited to handling only one tool call per conversation turn. While this is sufficient for simple tasks, many real-world problems require multiple sequential steps, and often the number and nature of these steps can't be determined in advance. In this lesson, we'll work together to transform Gemini from a single-turn tool user into an autonomous agent capable of iterative problem-solving. We'll build an agent class that can call tools, analyze results, decide what to do next, and continue this process until complex multi-step tasks are completed. This marks a shift from reactive tool usage to proactive, intelligent problem-solving — mirroring how humans approach complex challenges.

The Action-Feedback Loop Concept

Before we start coding, let's understand how autonomous agents operate through action-feedback loops . In this pattern, each tool execution provides information that influences the next decision. This iterative process mirrors human problem-solving: we take an action, observe the result, decide what to do next, and repeat until we reach our goal. The action-feedback loop consists of four key phases that repeat until task completion: Decision Phase: Gemini analyzes the current situation and determines the next action, which may include calling one or more tools. Action Phase: Our agent executes the requested tool(s) based on Gemini's instructions. Feedback Phase: The results from the tool execution(s) are captured and added to the conversation history. Evaluation Phase: Gemini reviews the new information, decides whether the task is complete or if additional steps are needed, and the loop continues. This loop structure enables complex problem-solving because each iteration builds upon previous results. For example, when solving a quadratic equation, Gemini might first calculate the discriminant, then use that result to determine if real solutions exist, then calculate the square root of the discriminant, and finally compute the two solutions. The key insight is that Gemini doesn't need to plan all steps in advance — it can adapt its approach based on intermediate results, just like a human mathematician working through a problem. Now, let's start building our agent class to make this iterative process possible.

Building Our Agent Class Foundation

Let's begin by creating the foundation of our autonomous agent using Gemini. We'll use the Google Generative AI Python SDK and configure it to use the "models/gemini-flash-latest" model. We'll also set up tool integration using Gemini's tool-calling API. Here's the initial class definition and constructor: Pythonfrom google import genai from google.genai import types class Agent: BASE_SYSTEM_PROMPT = ( "You are an autonomous agent that can take multiple tool-calling steps when helpful. " "The user only sees your response when you stop using tools, not your tool usage or reasoning steps. " "When you provide your answer without calling tools, make it complete and standalone.\n" "Additional instructions:\n" ) def __init__( self, name, system_prompt="You are a helpful assistant.", model="models/gemini-flash-latest", tools=None, tool_schemas=None, max_turns=10, api_key=None ): client_kwargs = {} if api_key is not None: client_kwargs["api_key"] = api_key self.client = genai.Client(**client_kwargs) if client_kwargs else genai.Client() self.name = name self.model = model self.system_prompt = self.BASE_SYSTEM_PROMPT + system_prompt self.max_turns = max_turns # Avoid shared mutable defaults and protect against external mutation self.tools = {} if tools is None else dict(tools) # name -> Python function self.tool_schemas = [] if tool_schemas is None else list(tool_schemas) # list of tool schemasfrom google import genai from google.genai import types class Agent: BASE_SYSTEM_PROMPT = ( "You are an autonomous agent that can take multiple tool-calling steps when helpful. " "The user only sees your response when you stop using tools, not your tool usage or reasoning steps. " "When you provide your answer without calling tools, make it complete and standalone.\n" "Additional instructions:\n" ) def __init__( self, name, system_prompt="You are a helpful assistant.", model="models/gemini-flash-latest", tools=None, tool_schemas=None, max_turns=10, api_key=None ): client_kwargs = {} if api_key is not None: client_kwargs["api_key"] = api_key self.client = genai.Client(**client_kwargs) if client_kwargs else genai.Client() self.name = name self.model = model self.system_prompt = self.BASE_SYSTEM_PROMPT + system_prompt self.max_turns = max_turns # Avoid shared mutable defaults and protect against external mutation self.tools = {} if tools is None else dict(tools) # name -> Python function self.tool_schemas = [] if tool_schemas is None else list(tool_schemas) # list of tool schemas Key design decisions: BASE_SYSTEM_PROMPT: Instructs Gemini to use tools as needed and only show the final answer to the user. Constructor parameters: name: Identifier for the agent. system_prompt: Customizable for different domains. model: Specifies the Gemini model. tools and tool_schemas: Passed as dictionaries/lists and copied to avoid shared state. max_turns: Prevents infinite loops. api_key: Allows explicit configuration of the API key, though the SDK will automatically look for the GOOGLE_API_KEY environment variable if not provided. This structure prepares us to implement the core functionality for autonomous, multi-step tool use with Gemini.

Adding Helper Methods for State Management

As our agent works through complex problems, we need to manage conversation state properly. Let's add two essential helper methods for Gemini's message and response structure: Pythondef _extract_text(self, parts): # Gemini responses are lists of "parts", each with a "text" field return "".join( part.get("text", "") for part in parts if "text" in part ) def _build_request_args(self, messages): # Prepare the arguments for Gemini's generate_content call request_args = { "model": self.model, "contents": messages, "system_instruction": self.system_prompt, "tools": self.tool_schemas if self.tool_schemas else None, "generation_config": { "max_output_tokens": 8000 } } # Remove None values (Gemini API doesn't accept them) return {k: v for k, v in request_args.items() if v is not None}def _extract_text(self, parts): # Gemini responses are lists of "parts", each with a "text" field return "".join( part.get("text", "") for part in parts if "text" in part ) def _build_request_args(self, messages): # Prepare the arguments for Gemini's generate_content call request_args = { "model": self.model, "contents": messages, "system_instruction": self.system_prompt, "tools": self.tool_schemas if self.tool_schemas else None, "generation_config": { "max_output_tokens": 8000 } } # Remove None values (Gemini API doesn't accept them) return {k: v for k, v in request_args.items() if v is not None} Explanation: _extract_text: Combines all text parts from Gemini's response, ensuring we return a clean, readable final response. _build_request_args: Centralizes how we construct API requests, ensuring consistent parameters and conditional inclusion of tool schemas. These helpers keep our orchestration logic clean and focused.

Implementing Tool Execution

Now, let's add the method that handles individual tool executions within our agent loop, using Gemini's tool call conventions: Pythondef call_tool(self, tool_call): tool_name = tool_call.function_call.name tool_args = tool_call.function_call.args or {} print(f"🔧 Tool called: {tool_name}({tool_args})") try: result = str(self.tools[tool_name](**tool_args)) except KeyError: result = f"Error: Tool {tool_name} not found" except Exception as e: result = f"Error: {str(e)}" # Gemini expects tool responses as function_response parts return { "function_response": { "name": tool_name, "response": {"result": result} } }def call_tool(self, tool_call): tool_name = tool_call.function_call.name tool_args = tool_call.function_call.args or {} print(f"🔧 Tool called: {tool_name}({tool_args})") try: result = str(self.tools[tool_name](**tool_args)) except KeyError: result = f"Error: Tool {tool_name} not found" except Exception as e: result = f"Error: {str(e)}" # Gemini expects tool responses as function_response parts return { "function_response": { "name": tool_name, "response": {"result": result} } } Explanation: Extracts the tool name, arguments, and call ID from Gemini's tool call structure. Executes the tool with error handling. Returns the result in Gemini's expected format for tool responses. This ensures robust, resilient tool execution within the agent loop.

Building the Core Loop - Part 1: Understanding Stateless Design

Gemini's API is stateless: each call to the model includes the full conversation history. Our agent should not store state between runs. Instead, we pass the entire message history as input and work on a copy: Python def run(self, input_messages): # Work on a copy to avoid mutating the original messages = [dict(m) for m in input_messages] def run(self, input_messages): # Work on a copy to avoid mutating the original messages = [dict(m) for m in input_messages] This design allows the same agent instance to handle multiple independent conversations and gives you full control over context management.

Building the Core Loop - Part 2: Setting Up the Iteration

Now, let's add the basic loop structure for iterative problem-solving with Gemini: Pythondef run(self, input_messages): messages = [dict(m) for m in input_messages] turn = 0 while turn < self.max_turns: turn += 1 # Call Gemini's generate_content API response = self.gemini_generate(**self._build_request_args(messages)) # Gemini returns a response with 'candidates', each with 'content' candidate = response.candidates[0] content = candidate.content # Add Gemini's response to the conversation history messages.append({ "role": "model", "parts": content.parts })def run(self, input_messages): messages = [dict(m) for m in input_messages] turn = 0 while turn < self.max_turns: turn += 1 # Call Gemini's generate_content API response = self.gemini_generate(**self._build_request_args(messages)) # Gemini returns a response with 'candidates', each with 'content' candidate = response.candidates[0] content = candidate.content # Add Gemini's response to the conversation history messages.append({ "role": "model", "parts": content.parts }) Each iteration represents one action-feedback cycle, and the turn counter prevents infinite loops.

Building the Core Loop - Part 3: Handling Tool Calls

Gemini signals tool calls via the content.parts structure, where a part may have a function_call. Let's handle tool calls accordingly: Pythondef run(self, input_messages): # ... previous code ... # Check for tool calls in the response tool_calls = [ part for part in content.parts if getattr(part, "function_call", None) ] if tool_calls: tool_results = [] # list of function_response parts for tool_call in tool_calls: tool_result = self.call_tool(tool_call) tool_results.append(tool_result) # Add tool results as a user message in Gemini's format messages.append({ "role": "user", "parts": tool_results })def run(self, input_messages): # ... previous code ... # Check for tool calls in the response tool_calls = [ part for part in content.parts if getattr(part, "function_call", None) ] if tool_calls: tool_results = [] # list of function_response parts for tool_call in tool_calls: tool_result = self.call_tool(tool_call) tool_results.append(tool_result) # Add tool results as a user message in Gemini's format messages.append({ "role": "user", "parts": tool_results }) This ensures all tool calls are executed and their results are fed back into the conversation.

Building the Core Loop - Part 4: Managing Flow Control

When Gemini doesn't request any tool calls, it means the agent has reached a final answer. Let's extract the answer and return it: Python def run(self, input_messages): # ... previous code ... if tool_calls: # ... handle tool calls ... continue else: # No tool calls: extract and return the final answer response_text = self._extract_text(content.parts) return messages, response_text # If max turns reached, raise an exception raise Exception("Max turns reached") def run(self, input_messages): # ... previous code ... if tool_calls: # ... handle tool calls ... continue else: # No tool calls: extract and return the final answer response_text = self._extract_text(content.parts) return messages, response_text # If max turns reached, raise an exception raise Exception("Max turns reached") This structure ensures the agent returns the final answer and the full conversation history, or raises an exception if the loop limit is reached.

Complete Run Method

Here's the complete run method for our Gemini agent: Pythondef run(self, input_messages): messages = [dict(m) for m in input_messages] turn = 0 while turn < self.max_turns: turn += 1 response = self.gemini_generate(**self._build_request_args(messages)) candidate = response.candidates[0] content = candidate.content messages.append({ "role": "model", "parts": content.parts }) # Detect tool calls tool_calls = [ part for part in content.parts if getattr(part, "function_call", None) ] if tool_calls: tool_results = [] # list of function_response parts for tool_call in tool_calls: tool_result = self.call_tool(tool_call) tool_results.append(tool_result) messages.append({ "role": "user", "parts": tool_results }) continue else: response_text = self._extract_text(content.parts) return messages, response_text raise Exception("Max turns reached")def run(self, input_messages): messages = [dict(m) for m in input_messages] turn = 0 while turn < self.max_turns: turn += 1 response = self.gemini_generate(**self._build_request_args(messages)) candidate = response.candidates[0] content = candidate.content messages.append({ "role": "model", "parts": content.parts }) # Detect tool calls tool_calls = [ part for part in content.parts if getattr(part, "function_call", None) ] if tool_calls: tool_results = [] # list of function_response parts for tool_call in tool_calls: tool_result = self.call_tool(tool_call) tool_results.append(tool_result) messages.append({ "role": "user", "parts": tool_results }) continue else: response_text = self._extract_text(content.parts) return messages, response_text raise Exception("Max turns reached")

Testing Our Autonomous Agent

Let's test our Gemini-based agent on a complex quadratic equation. We'll use math tools and schemas as before. Ensure you have your GOOGLE_API_KEY set as an environment variable so the SDK can access it securely. Pythonimport json import os from agent import Agent from functions import sum_numbers, multiply_numbers, subtract_numbers, divide_numbers, power, square_root # Load the schemas from JSON file with open('schemas.json', 'r') as f: tool_schemas = json.load(f) tools = { "sum_numbers": sum_numbers, "multiply_numbers": multiply_numbers, "subtract_numbers": subtract_numbers, "divide_numbers": divide_numbers, "power": power, "square_root": square_root } # Create the Gemini agent # The SDK automatically uses the GOOGLE_API_KEY environment variable agent = Agent( name="math_assistant", system_prompt="You are a helpful math assistant.", tools=tools, tool_schemas=tool_schemas, max_turns=15 ) # Initialize conversation with user message messages = [{ "role": "user", "parts": [{"text": "Solve this equation: 2x² - 7x + 3 = 0"}] }] # Run the agent messages, result = agent.run(messages) # Display the response print("\nFinal response:") print(result)import json import os from agent import Agent from functions import sum_numbers, multiply_numbers, subtract_numbers, divide_numbers, power, square_root # Load the schemas from JSON file with open('schemas.json', 'r') as f: tool_schemas = json.load(f) tools = { "sum_numbers": sum_numbers, "multiply_numbers": multiply_numbers, "subtract_numbers": subtract_numbers, "divide_numbers": divide_numbers, "power": power, "square_root": square_root } # Create the Gemini agent # The SDK automatically uses the GOOGLE_API_KEY environment variable agent = Agent( name="math_assistant", system_prompt="You are a helpful math assistant.", tools=tools, tool_schemas=tool_schemas, max_turns=15 ) # Initialize conversation with user message messages = [{ "role": "user", "parts": [{"text": "Solve this equation: 2x² - 7x + 3 = 0"}] }] # Run the agent messages, result = agent.run(messages) # Display the response print("\nFinal response:") print(result) Sample output: text🔧 Tool called: power({'base': -7, 'exponent': 2}) 🔧 Tool called: multiply_numbers({'a': 4, 'b': 2}) 🔧 Tool called: multiply_numbers({'a': 8, 'b': 3}) 🔧 Tool called: subtract_numbers({'a': 49, 'b': 24}) 🔧 Tool called: square_root({'number': 25}) 🔧 Tool called: multiply_numbers({'a': 2, 'b': 2}) 🔧 Tool called: sum_numbers({'a': 7, 'b': 5}) 🔧 Tool called: divide_numbers({'a': 12, 'b': 4}) 🔧 Tool called: subtract_numbers({'a': 7, 'b': 5}) 🔧 Tool called: divide_numbers({'a': 2, 'b': 4}) Final response: The solutions to the equation 2x² - 7x + 3 = 0 are: **x = 3** and **x = 0.5** (or x = 1/2) You can verify these solutions by substituting back into the original equation: - For x = 3: 2(3)² - 7(3) + 3 = 18 - 21 + 3 = 0 ✓ - For x = 0.5: 2(0.5)² - 7(0.5) + 3 = 0.5 - 3.5 + 3 = 0 ✓🔧 Tool called: power({'base': -7, 'exponent': 2}) 🔧 Tool called: multiply_numbers({'a': 4, 'b': 2}) 🔧 Tool called: multiply_numbers({'a': 8, 'b': 3}) 🔧 Tool called: subtract_numbers({'a': 49, 'b': 24}) 🔧 Tool called: square_root({'number': 25}) 🔧 Tool called: multiply_numbers({'a': 2, 'b': 2}) 🔧 Tool called: sum_numbers({'a': 7, 'b': 5}) 🔧 Tool called: divide_numbers({'a': 12, 'b': 4}) 🔧 Tool called: subtract_numbers({'a': 7, 'b': 5}) 🔧 Tool called: divide_numbers({'a': 2, 'b': 4}) Final response: The solutions to the equation 2x² - 7x + 3 = 0 are: **x = 3** and **x = 0.5** (or x = 1/2) You can verify these solutions by substituting back into the original equation: - For x = 3: 2(3)² - 7(3) + 3 = 18 - 21 + 3 = 0 ✓ - For x = 0.5: 2(0.5)² - 7(0.5) + 3 = 0.5 - 3.5 + 3 = 0 ✓ Our Gemini agent systematically applies the quadratic formula, making multiple tool calls and building upon previous results, just like a human mathematician.

Summary & Practice Preparation

Together, we've built an autonomous agent using Gemini that can tackle complex, multi-step problem-solving tasks. Our agent class manages conversation state, tool execution, and iterative decision-making in a reusable structure that can handle problems requiring many sequential operations. This architecture enables Gemini to operate as a true autonomous agent: it can assess situations, make decisions, execute tools, learn from results, and continue iterating until complex tasks are completed. This is a significant advancement from simple tool usage to intelligent, adaptive problem-solving. In the upcoming practice exercises, you'll implement your own autonomous agents, experiment with different system prompts and tool combinations, and tackle increasingly complex multi-step problems. You'll gain hands-on experience with debugging and optimization techniques for production agent systems, building on the solid foundation we've created together.