Task Routing with Gemini

Introduction & Context

Welcome back! In the previous lesson, you learned how to use prompt chaining to connect multiple Gemini API calls in a linear sequence, breaking down complex tasks into manageable steps. This approach works well when you know exactly what sequence of operations is needed. However, real-world applications often face unpredictable requests that require different types of expertise. Instead of building separate chains for every possible request, routing workflows use Gemini to analyze incoming requests and intelligently direct them to the right specialist — such as a math expert, a writing expert, or a code expert. This lesson will show you how to build a flexible routing system that maintains high-quality responses by leveraging specialized system prompts for each domain, all using Gemini and its API.

Workflow Design at a Glance

The routing workflow follows a clean two-step pattern that's more dynamic than the linear chains you've used before. When a user submits a request, your system first sends it to a router — a Gemini model instance with a specialized prompt designed to classify the request type. The router analyzes the content and returns a decision about which specialist should handle it. Once you have the routing decision, you send the original user request to a second Gemini model instance configured with the appropriate specialist prompt. This specialist focuses entirely on providing the best possible response within their domain of expertise, whether that's solving equations, crafting stories, or debugging code. The key insight here is that you're using the same Gemini API for both calls, but with completely different models and prompts that give each instance a distinct role and expertise. The router uses a fast, lightweight model optimized for classification, while the specialist uses a more capable model to deliver high-quality domain responses. This separation of concerns makes your system both more reliable and easier to extend with new specialist types.

Crafting the Router Prompt

The router prompt is the critical component that determines how accurately your system classifies incoming requests. Unlike open-ended prompts for creative tasks, router prompts need to be strict and constrained to ensure reliable, parseable output. Your router prompt should establish Gemini's role as a decision-maker and provide clear, unambiguous options. Here's an effective approach that limits the router to exactly three choices: Pythonrouter_prompt = """ You are a task router. Analyze user requests and return JSON only. Available specialists: - math_specialist: For mathematical calculations, equations, and numerical problems - writing_specialist: For creative writing, essays, stories, and text composition - code_specialist: For programming questions, code review, and technical implementation Return JSON in this exact shape: {"route": "math_specialist" | "writing_specialist" | "code_specialist"} """router_prompt = """ You are a task router. Analyze user requests and return JSON only. Available specialists: - math_specialist: For mathematical calculations, equations, and numerical problems - writing_specialist: For creative writing, essays, stories, and text composition - code_specialist: For programming questions, code review, and technical implementation Return JSON in this exact shape: {"route": "math_specialist" | "writing_specialist" | "code_specialist"} """ The phrase "return JSON only" is crucial because it prevents Gemini from adding extra text that would complicate parsing. You want a strict, machine‑readable response with a single route field. The explicit list of specialist names with their descriptions helps Gemini understand the boundaries between categories and reduces ambiguous classifications. Notice how each specialist description focuses on clear, distinct domains. Mathematical problems are clearly different from creative writing, which is clearly different from programming tasks. This separation reduces edge cases where the router might struggle to choose between specialists.

Defining Specialist System Prompts

Preparing and Sending the Router Request

To begin the routing workflow, you first prepare the user request and send it to the router. The router uses a strict JSON output contract so your code can parse and validate the decision safely. Below is an example using the Google Gen AI Python SDK (google.genai). Make sure you have installed the package (pip install google-genai) and set up your API key as described in the Gemini API documentation. Pythonfrom google import genai from google.genai import types import os import json # Set up Gemini API credentials API = os.environ["GOOGLE_API_KEY"] BASE = os.environ.get("GOOGLE_BASE_URL", "").rstrip("/") client_kwargs = {"api_key": API} if BASE: client_kwargs["http_options"] = types.HttpOptions(base_url=BASE) client = genai.Client(**client_kwargs) # Use a fast model for routing and a capable model for specialist responses router_model = "models/gemini-flash-latest" specialist_model = "models/gemini-pro-latest" # Router prompt (strict JSON output) router_prompt = """ You are a task router. Analyze the user's request and return JSON only. Available specialists: - math_specialist: For mathematical calculations, equations, and numerical problems - writing_specialist: For creative writing, essays, stories, and text composition - code_specialist: For programming questions, code review, and technical implementation Return JSON in this exact shape: {"route": "math_specialist" | "writing_specialist" | "code_specialist"} """ # Example user request user_request = "Write me a short story about robots" # Step 1: Route the request router_response = client.models.generate_content( model=router_model, contents=user_request, config=types.GenerateContentConfig( system_instruction=router_prompt, max_output_tokens=50, temperature=0.0, ), )from google import genai from google.genai import types import os import json # Set up Gemini API credentials API = os.environ["GOOGLE_API_KEY"] BASE = os.environ.get("GOOGLE_BASE_URL", "").rstrip("/") client_kwargs = {"api_key": API} if BASE: client_kwargs["http_options"] = types.HttpOptions(base_url=BASE) client = genai.Client(**client_kwargs) # Use a fast model for routing and a capable model for specialist responses router_model = "models/gemini-flash-latest" specialist_model = "models/gemini-pro-latest" # Router prompt (strict JSON output) router_prompt = """ You are a task router. Analyze the user's request and return JSON only. Available specialists: - math_specialist: For mathematical calculations, equations, and numerical problems - writing_specialist: For creative writing, essays, stories, and text composition - code_specialist: For programming questions, code review, and technical implementation Return JSON in this exact shape: {"route": "math_specialist" | "writing_specialist" | "code_specialist"} """ # Example user request user_request = "Write me a short story about robots" # Step 1: Route the request router_response = client.models.generate_content( model=router_model, contents=user_request, config=types.GenerateContentConfig( system_instruction=router_prompt, max_output_tokens=50, temperature=0.0, ), ) Here, you set up the Gemini client, specify the models, and construct the message payload. The max_output_tokens parameter ensures the router's response is concise, as you expect only a short specialist name. Setting temperature to 0.0 makes the output deterministic, reducing the chance of unexpected responses. Note on Routing Determinism and Trade-offs: Using temperature=0.0 is recommended for routing because it makes the model's output as deterministic as possible—Gemini will almost always return the same result for the same input, which is crucial when you need reliable, parseable specialist names. If you increase temperature above zero, the model may introduce variability in its responses, which can sometimes help with ambiguous or borderline cases but also risks less predictable output (e.g., extra text or inconsistent specialist names). This can make parsing and downstream handling more error-prone. Similarly, keeping max_output_tokens low encourages the model to return a compact JSON object. If you want an optional explanation for debugging, include an "explanation" field in the JSON schema and raise the token limit accordingly — but keep the response machine‑readable and validate the route against an allowlist. This confirms that the router correctly identified the request as a creative writing task.

Extracting and Handling the Router's Response

After receiving the router's response, parse the JSON, validate against an allowlist, and fall back safely if needed. Pythonimport json allowed_routes = {"math_specialist", "writing_specialist", "code_specialist"} raw_text = router_response.text.strip() try: data = json.loads(raw_text) specialist_choice = data.get("route", "").strip() except json.JSONDecodeError: specialist_choice = "" if specialist_choice not in allowed_routes: specialist_choice = "writing_specialist" # safe default fallback print(f"Router decision: {specialist_choice}")import json allowed_routes = {"math_specialist", "writing_specialist", "code_specialist"} raw_text = router_response.text.strip() try: data = json.loads(raw_text) specialist_choice = data.get("route", "").strip() except json.JSONDecodeError: specialist_choice = "" if specialist_choice not in allowed_routes: specialist_choice = "writing_specialist" # safe default fallback print(f"Router decision: {specialist_choice}") The print statement is useful for debugging and verifying that the router is making the correct classification. For the example request "Write me a short story about robots", you should see: textRouter decision: writing_specialistRouter decision: writing_specialist This confirms that the router correctly identified the request as a creative writing task.

Mapping the Router Decision to a Specialist Prompt

Sending the User Request to the Specialist

After selecting the correct specialist prompt, you send the original user request to the chosen specialist. The specialist uses their domain expertise to generate the final response. Use system_instruction to set specialist behavior and keep the user request clean. Python # Step 3: Send user request to the chosen specialist specialist_response = client.models.generate_content(model=specialist_model, contents=user_request, config=types.GenerateContentConfig(system_instruction=specialist_prompt, max_output_tokens=2048, temperature=0.7,),) # Step 3: Send user request to the chosen specialist specialist_response = client.models.generate_content(model=specialist_model, contents=user_request, config=types.GenerateContentConfig(system_instruction=specialist_prompt, max_output_tokens=2048, temperature=0.7,),) Notice that you send the original user_request to the specialist, not the router's response. The router's job is purely classification; the specialist needs to see the actual user question to provide a helpful answer. This separation keeps the workflow clean and ensures the specialist has all the context needed to respond effectively.

Extracting and Displaying the Specialist's Response

Finally, extract the specialist's response and display it. This is the answer that will be returned to the user. Python # Extract and display the specialist's response final_response = specialist_response.candidates[0].content.parts[0].text.strip() print("Specialist Response:") print(final_response) # Extract and display the specialist's response final_response = specialist_response.candidates[0].content.parts[0].text.strip() print("Specialist Response:") print(final_response) When you run this code with the robot story request, you should see output like: text Specialist Response: The Last Dance Maya pressed her palm against the maintenance panel, and ARIA-7's chest compartment glowed softly before opening with a gentle hiss. Inside, circuits hummed like a mechanical heartbeat, but something was wrong — several nodes flickered erratically, their light growing dimmer with each pulse... Specialist Response: The Last Dance Maya pressed her palm against the maintenance panel, and ARIA-7's chest compartment glowed softly before opening with a gentle hiss. Inside, circuits hummed like a mechanical heartbeat, but something was wrong — several nodes flickered erratically, their light growing dimmer with each pulse... This demonstrates that the writing specialist correctly understood the creative writing request and produced an engaging story opening rather than trying to solve it as a math problem or write code.

Summary and Prep for Practice

You've successfully learned to implement intelligent task routing that uses Gemini to classify requests and direct them to specialized prompts optimized for specific domains. Your routing workflow follows a clean two-step pattern: analyze the request with a fast router model to get a classification decision, then send the original request to a capable specialist model for the final response. The routing patterns you've learned here form the foundation for much more sophisticated AI workflows. As you continue through this course, you'll see how these routing concepts extend to dynamic workflows and complex agent behaviors that can handle real-world business problems with multiple types of expertise working together.