Caching Responses for Performance

Introduction

Welcome back to the Load Balancing and Performance Tuning course! Having mastered basic and advanced load balancing strategies in the first two lessons, we're now ready to tackle another critical aspect of performance optimization: caching. While load balancing distributes work across multiple servers, caching reduces that work altogether by storing and reusing responses from previous requests. In this lesson, we'll implement a comprehensive caching solution in NGINX that includes cache storage configuration, intelligent cache key design, stale content handling, and cache management capabilities. By the end of this lesson, we'll have transformed our load balancer into a high-performance caching proxy that dramatically reduces backend load and improves response times.

The Value of Proxy Caching

Before diving into configuration, let's understand why caching matters. Every request that reaches your backend servers consumes resources: CPU cycles for processing, database queries for data retrieval, and network bandwidth for communication. When the same content is requested repeatedly, processing identical requests over and over wastes these resources and introduces unnecessary latency.

Proxy caching addresses this by storing responses from backend servers and serving them directly for subsequent identical requests. Instead of forwarding every request to the backend, NGINX checks its cache first. If a valid cached response exists, NGINX returns it immediately without involving the backend at all. This approach offers several benefits:

Reduced backend load, allowing servers to handle more unique requests.
Faster response times, since serving from memory is much quicker than backend processing.
Improved resilience, as cached content can be served even when backends are temporarily unavailable.

The key is configuring caching intelligently so that you cache the right content for the right duration.

Defining Cache Storage and Zones

To enable caching, we first need to tell NGINX where to store cached content and how to manage that storage. This is done with the proxy_cache_path directive in the http block:

This directive establishes a cache zone with specific characteristics:

/tmp/nginx_cache: The filesystem path where cached files are stored.
levels=1:2: Creates a two-level directory hierarchy to prevent too many files in a single directory.
keys_zone=api_cache:10m: Defines a shared memory zone named api_cache with 10 MB for storing cache keys and metadata.
max_size=100m: Limits the total cache size to 100 MB; older entries are removed when this limit is reached.
inactive=60m: Removes cached items that haven't been accessed for 60 minutes.
use_temp_path=off: Writes cache files directly to the cache directory, improving performance.

The keys_zone parameter is particularly important: it defines the name we'll reference later and allocates memory for tracking cache entries efficiently.

Setting Up the Backend Upstream

Just as in previous lessons, we need an upstream group to distribute requests across backend servers:

This familiar configuration creates a round-robin distribution across three backend servers. The difference now is that many requests won't even reach these servers; they'll be served directly from the cache instead.

Enabling Cache for Specific Paths

With the cache zone defined, we can now enable caching for specific locations. The /api/cached/ endpoint demonstrates a comprehensive caching setup:

The proxy_cache directive activates the api_cache zone we defined earlier. The proxy_cache_key determines what makes requests unique: different schemes, hosts, URIs, or query parameters will be cached separately. Setting distinct cache validity periods for different response codes makes sense because successful responses (200) can typically be cached longer than error responses (404).

Important consideration: If cache validity is set to 5 minutes and the backend data changes during that time, clients will still receive the cached (potentially outdated) response until the cache expires. There are several ways to address this trade-off between performance and freshness:

Lower the cache validity time, so updates appear sooner at the cost of more frequent backend requests.
Use cache purging to manually remove outdated entries when data changes on the backend.
Implement cache revalidation, so NGINX checks with the backend (using conditional requests with ETags or Last-Modified headers) before serving cached content.

The right approach depends on your specific requirements: some content can tolerate longer staleness, while other data needs to be more current.

Designing Effective Cache Keys

The cache key deserves special attention because it determines when requests are considered identical. Our configuration uses:

This key includes several components:

$scheme: Whether the request used HTTP or HTTPS.
$proxy_host: The target hostname being proxied to.
$request_uri: The full path and query string of the request.
$is_args: A question mark if query arguments exist, empty otherwise.
$args: The actual query string parameters.

This design ensures that GET /api/cached/users and GET /api/cached/users?page=2 are cached as separate entries, which is exactly what we want since they return different data. The cache key fundamentally defines what "identical request" means.

Adding Cache Status Visibility

To understand whether requests are hitting the cache or missing it, we add a custom header that exposes this information:

The $upstream_cache_status variable contains values like HIT (served from cache), MISS (fetched from backend), EXPIRED (cache entry was stale), or BYPASS (caching was intentionally bypassed). The always parameter ensures this header appears even in error responses, which is useful for debugging.

Serving Stale Content During Issues

One of caching's most powerful features is the ability to serve slightly outdated content when backend servers encounter problems. This is configured with proxy_cache_use_stale:

This directive tells NGINX when serving stale cached content is acceptable:

error: When an error occurs connecting to the backend.
timeout: When the backend doesn't respond in time.
updating: While a cache entry is being refreshed in the background.
http_500/502/503/504: When the backend returns these specific error codes.

Instead of returning errors to users, NGINX serves cached content that's past its validity period, maintaining availability even during backend instability. The cache status header will show STALE in these cases.

Updating Cache in the Background

To further improve performance, we can enable background cache updates:

When this is enabled and a cached entry expires, NGINX immediately serves the stale version to the client while simultaneously updating the cache in the background. This means users never wait for cache refreshes; they always get an instant response. The next request will then receive the freshly updated content. This creates a smooth experience in which cache updates happen transparently without introducing latency.

Preventing Thundering Herd with Cache Locking

When a popular cached item expires and multiple requests arrive simultaneously for the same content, they might all rush to the backend at once, a phenomenon called the thundering herd problem. Cache locking prevents this:

With cache locking enabled, when multiple requests arrive for an expired cache entry, only the first one is forwarded to the backend. The other requests wait for that first one to complete and then all receive the newly cached response. This serializes backend requests and prevents overwhelming the backend with duplicate work.

Bypassing Cache When Needed

Some requests should never be cached, perhaps because they contain sensitive data or because they modify state. We can configure a location that explicitly bypasses caching:

Setting both proxy_no_cache and proxy_cache_bypass to 1 ensures responses aren't cached and the cache isn't checked for this location. Every request goes directly to the backend. The custom header confirms this behavior by showing BYPASS as the cache status.

Conclusion and Next Steps

Excellent progress! We've implemented a sophisticated caching system in NGINX that includes intelligent storage management, flexible cache key design, graceful handling of stale content, background updates, cache locking, selective bypassing, and manual purging capabilities. These features work together to create a robust caching layer that dramatically improves performance while maintaining reliability even during backend issues. You now understand how to configure caching for different scenarios and how to monitor cache behavior through status headers. The knowledge you've gained in these three lessons forms a solid foundation for building high-performance, scalable web applications. Now it's time to apply everything you've learned through hands-on practice, which will cement these caching concepts and prepare you for real-world implementations!

Previous Lesson

Next Lesson: Enabling Gzip Compression

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal