Welcome to another lesson of this course on MCP server and agent integration in TypeScript. So far, you learned how to build an MCP server and connect it to an agent, giving your agent the ability to use external tools. Now, we will take your skills further by focusing on how to make your agent more efficient and responsive, especially when handling a sequence of user queries.
In this lesson, you will learn how to use tool caching to reduce latency and improve performance when running an agent across multiple queries. These techniques are essential for building agents that feel fast and natural in real-world applications. We'll start by examining a real-world scenario where an agent handles multiple user requests, then identify the performance bottlenecks, and finally implement caching to solve these issues.
To understand the impact of tool caching, let's start with a practical example of an agent handling multiple user queries. In real-world applications, users rarely ask just one question - they typically engage in conversations with follow-up requests, clarifications, and related tasks.
Here's how you might set up an agent to manage a shopping list across several interactions:
This example demonstrates a typical user interaction pattern: viewing current state, making changes, and updating items. Each query requires the agent to understand what tools are available before it can decide how to respond. Let's examine what happens behind the scenes when this code runs.
When you run the above code without tool caching, here's what you'll see in your MCP server logs. Pay close attention to how many times the agent requests the tool list:
Notice the pattern: the agent sends a tools/list
request before processing each new user query, even though the available tools haven't changed between queries. This happens because the agent doesn't remember what tools were available from previous interactions - it starts fresh each time.
The repetitive tools/list
requests represent wasted time and resources. In a local development environment, this might only add a few milliseconds per request, but in production environments with remote servers or complex tool sets, these delays can significantly impact user experience.
The server logs reveal a fundamental inefficiency in how agents typically interact with MCP servers. Every time an agent starts processing a new query, it needs to know what tools are available to help answer the user's question. By default, the agent asks the MCP server for the complete list of tools each time, treating each query as if it's the first interaction.
This repeated fetching creates several problems:
-
Increased latency: Each
tools/list
request adds delay before the agent can start working on the actual user query. The agent must wait for the server response before it can even begin planning its approach. -
Server load: Unnecessary requests consume server resources, especially problematic when multiple agents or users are active simultaneously. The server spends time generating the same response repeatedly.
-
Network overhead: If the MCP server is remote, each
tools/list
request creates additional network traffic. This is particularly costly in cloud environments where network latency and bandwidth usage directly impact performance and costs. -
Poor user experience: Users notice these delays, especially in multi-turn conversations where they expect quick responses to follow-up questions. The cumulative effect of multiple tool list requests can make the agent feel sluggish and unresponsive.
The core issue is that tool metadata rarely changes during a conversation, yet the agent treats it as if it might change between every query. Tool caching solves this problem by recognizing that tool lists are typically stable and can be safely reused across multiple interactions.
Fortunately, the OpenAI Agents SDK provides a simple solution to eliminate these redundant requests. To enable tool caching, you only need to add a single parameter when creating your MCP server connection:
That's it! With this single parameter change, your agent will fetch the tool list once during the first time it needs the tools and reuse that cached information for all subsequent queries during the lifetime of the MCP client instance. The agent maintains this cache in memory, so there's no additional storage or configuration required.
The cacheToolsList
parameter tells the agent runtime to store the tool list response after the first successful tools/list
request. When the agent needs to know what tools are available for future queries, it uses the cached data instead of making another network request to the server.
With tool caching enabled, the same multi-query conversation produces dramatically cleaner and more efficient server logs:
The difference is striking: tools/list
appears only once at the beginning of the session, during the initial setup phase. All subsequent interactions are purely functional tool calls that accomplish the user's actual requests. This represents a significant reduction in unnecessary network traffic and server processing.
Even if you close and reconnect the same client instance, you'll see the initialize
and notifications/initialized
requests for the new connection, but no additional tools/list
request since the cache persists across reconnections.
This optimization becomes even more valuable as conversations grow longer. Without caching, a 10-query conversation would generate 10 separate tools/list
requests. With caching enabled, that same conversation generates only 1 tools/list
request, regardless of length.
The performance improvement is immediately noticeable to users, especially in scenarios where the MCP server is remote or the tool list is large and complex.
Understanding when to enable tool caching is crucial for building efficient agents. Tool caching provides the most benefit in specific scenarios where the conditions align with its strengths.
Tool caching is ideal when:
-
Stable tool sets: Your tools don't change frequently during runtime. Most applications have a fixed set of tools that remain constant throughout user sessions. For example, a shopping list application typically always offers the same core functions: add items, remove items, mark as purchased, view list.
-
Multi-query conversations: Users will ask multiple questions in sequence. This is the most common pattern in real-world applications where users engage in back-and-forth conversations rather than single, isolated queries.
-
Performance matters: You need fast response times and want to minimize any unnecessary delays. In customer-facing applications, every millisecond of latency affects user satisfaction.
-
Remote servers: The MCP server is not running locally, so network latency makes each
tools/list
request more expensive. Cloud deployments, microservices architectures, and distributed systems all benefit significantly from reduced network calls.
Tool caching provides the most dramatic improvements in scenarios where all these factors combine - stable tools, conversational interactions, performance requirements, and network considerations.
While tool caching offers significant benefits, there are important scenarios where it can cause problems rather than solve them. Understanding these limitations helps you make informed decisions about when to disable caching.
Avoid tool caching when:
-
Dynamic tools: Tools are frequently added, removed, or modified at runtime based on changing conditions. For example, if your application dynamically loads plugins or modules that provide new tools, caching would prevent the agent from discovering these new capabilities.
-
User-specific tools: Different users have access to different tool sets based on permissions, roles, or subscription levels. An admin user might have access to management tools that regular users cannot see. Caching could cause the agent to offer tools that the current user cannot actually use.
-
Context-dependent tools: Available tools change based on application state, user location, time of day, or other dynamic factors. For instance, a travel agent might offer different tools based on whether the user is planning domestic or international travel.
-
Single-query sessions: Each interaction is independent with no follow-up queries expected. In these cases, the overhead of maintaining a cache provides no benefit since there are no subsequent queries to optimize.
In these situations, caching can cause the agent to miss new tools, offer unavailable functionality, or provide outdated information about tool capabilities. The agent might continue using an outdated tool list, leading to errors or confusion when it tries to call tools that no longer exist or have changed their interface.
When you enable tool caching, it's important to understand how long the cached information remains valid and what happens when your application restarts or reconnects to the MCP server.
The tool cache is tied to the MCP client instance itself. This means:
-
Instance-based caching: The cache is created when you first call
tools/list
and persists for the lifetime of the MCP client instance, even across reconnections. -
No persistent storage: The cache doesn't survive application restarts. Each time you start your application and create a new MCP client instance, it will fetch the tool list fresh and cache it for that instance.
-
Connection-independent: The cache survives connection drops and reconnections. If you call
mcpServer.close()
and thenmcpServer.connect()
again on the same instance, the cached tool list remains available.
This design ensures that your application always starts with fresh tool information while providing performance benefits during the lifetime of the client instance, even across temporary connection issues.
Tool caching represents a simple but powerful optimization that can dramatically improve your agent's performance in multi-query scenarios. By eliminating redundant tools/list
requests, you reduce latency, decrease server load, and provide a smoother user experience.
The implementation is straightforward - just add cacheToolsList: true
to your server configuration - but the impact on performance can be substantial, especially in production environments with remote servers or complex tool sets.
Key takeaways for effective tool caching:
-
Enable caching by default: Use
cacheToolsList: true
in your server configuration when your tool set is stable and you expect multi-query conversations. -
Monitor your use case: Tool caching works best when tools remain consistent throughout user sessions. If your tools change frequently, consider the trade-offs carefully.
-
Understand cache lifetime: Remember that the cache exists only during the connection session and will be refreshed when your application restarts.
Remember that tool caching only affects the metadata about tools (names, descriptions, schemas) - the actual tool logic always runs fresh on the server. This means you can safely cache tool lists even when the underlying tool implementations are updated, as long as their interfaces remain consistent.
You're now ready to build more efficient agents that provide faster, smoother user experiences through intelligent tool caching!
