Welcome back! In the previous lesson, you learned how to use prompt chaining to connect multiple Gemini API calls in a linear sequence, breaking down complex tasks into manageable steps. This approach works well when you know exactly what sequence of operations is needed.
However, real-world applications often face unpredictable requests that require different types of expertise. Instead of building separate chains for every possible request, routing workflows use Gemini to analyze incoming requests and intelligently direct them to the right specialist — such as a math expert, a writing expert, or a code expert.
This lesson will show you how to build a flexible routing system that maintains high-quality responses by leveraging specialized system prompts for each domain, all using Gemini and its API.
The routing workflow follows a clean two-step pattern that's more dynamic than the linear chains you've used before. When a user submits a request, your system first sends it to a router — a Gemini model instance with a specialized prompt designed to classify the request type. The router analyzes the content and returns a decision about which specialist should handle it.
Once you have the routing decision, you send the original user request to a second Gemini model instance configured with the appropriate specialist prompt. This specialist focuses entirely on providing the best possible response within their domain of expertise, whether that's solving equations, crafting stories, or debugging code.
The key insight here is that you're using the same Gemini API for both calls, but with completely different models and prompts that give each instance a distinct role and expertise. The router uses a fast, lightweight model optimized for classification, while the specialist uses a more capable model to deliver high-quality domain responses. This separation of concerns makes your system both more reliable and easier to extend with new specialist types.
The router prompt is the critical component that determines how accurately your system classifies incoming requests. Unlike open-ended prompts for creative tasks, router prompts need to be strict and constrained to ensure reliable, parseable output.
Your router prompt should establish Gemini's role as a decision-maker and provide clear, unambiguous options. Here's an effective approach that limits the router to exactly three choices:
The phrase "return JSON only" is crucial because it prevents Gemini from adding extra text that would complicate parsing. You want a strict, machine‑readable response with a single route field. The explicit list of specialist names with their descriptions helps Gemini understand the boundaries between categories and reduces ambiguous classifications.
Notice how each specialist description focuses on clear, distinct domains. Mathematical problems are clearly different from creative writing, which is clearly different from programming tasks. This separation reduces edge cases where the router might struggle to choose between specialists.
Each specialist needs a focused prompt that establishes their expertise and approach. Unlike general-purpose prompts, specialist prompts should be concise and role-specific to maximize performance within their domain.
Your math specialist prompt should emphasize precision and clarity in mathematical communication:
The writing specialist focuses on creativity and engaging communication:
The code specialist emphasizes technical accuracy and practical implementation:
These prompts are intentionally brief because they need to work with a wide variety of requests within their domain. A math specialist might handle anything from basic arithmetic to complex calculus, so the prompt focuses on the general approach rather than specific techniques. This specialization improves both quality and consistency compared to using a generic "helpful assistant" prompt for all request types.
Once you've crafted effective specialist prompts, you can use them across different projects and routing systems. They become reliable building blocks for more complex AI workflows.
To begin the routing workflow, you first prepare the user request and send it to the router. The router uses a strict JSON output contract so your code can parse and validate the decision safely.
Below is an example using the Google Gen AI Python SDK (google.genai). Make sure you have installed the package (pip install google-genai) and set up your API key as described in the Gemini API documentation.
Here, you set up the Gemini client, specify the models, and construct the message payload. The max_output_tokens parameter ensures the router's response is concise, as you expect only a short specialist name. Setting temperature to 0.0 makes the output deterministic, reducing the chance of unexpected responses.
Using is recommended for routing because it makes the model's output as deterministic as possible—Gemini will almost always return the same result for the same input, which is crucial when you need reliable, parseable specialist names.
After receiving the router's response, parse the JSON, validate against an allowlist, and fall back safely if needed.
The print statement is useful for debugging and verifying that the router is making the correct classification. For the example request "Write me a short story about robots", you should see:
This confirms that the router correctly identified the request as a creative writing task.
Once you have the router's decision, you need to map it to the appropriate specialist prompt. This ensures that the user request is handled by the expert best suited for the task.
The if-elif-else structure provides a clean way to map router decisions to specialist prompts. Each condition checks for an exact string match with the expected specialist names from your router prompt. This approach is straightforward and easy to debug when routing decisions don't match expectations.
The else clause provides a crucial fallback mechanism. If the router returns an unexpected response — perhaps due to a prompt engineering issue or an edge case you didn't anticipate — the system defaults to a generic helpful assistant prompt rather than crashing. This graceful degradation keeps your system functional even when routing doesn't work perfectly.
After selecting the correct specialist prompt, you send the original user request to the chosen specialist. The specialist uses their domain expertise to generate the final response. Use system_instruction to set specialist behavior and keep the user request clean.
Notice that you send the original user_request to the specialist, not the router's response. The router's job is purely classification; the specialist needs to see the actual user question to provide a helpful answer. This separation keeps the workflow clean and ensures the specialist has all the context needed to respond effectively.
Finally, extract the specialist's response and display it. This is the answer that will be returned to the user.
When you run this code with the robot story request, you should see output like:
This demonstrates that the writing specialist correctly understood the creative writing request and produced an engaging story opening rather than trying to solve it as a math problem or write code.
You've successfully learned to implement intelligent task routing that uses Gemini to classify requests and direct them to specialized prompts optimized for specific domains. Your routing workflow follows a clean two-step pattern: analyze the request with a fast router model to get a classification decision, then send the original request to a capable specialist model for the final response.
The routing patterns you've learned here form the foundation for much more sophisticated AI workflows. As you continue through this course, you'll see how these routing concepts extend to dynamic workflows and complex agent behaviors that can handle real-world business problems with multiple types of expertise working together.
