Implementing Rate Limiting

Introduction

Welcome back to Securing Your NGINX Server! In the previous lesson, we explored how to protect sensitive areas of your application using HTTP Basic Authentication. Now, in this second lesson, we're shifting our focus to another crucial security mechanism: rate limiting.

Rate limiting helps protect your server from abuse by controlling how many requests a client can make within a specific time window. By the end of this lesson, you'll understand how to configure NGINX to automatically limit request rates per client IP address and respond appropriately when limits are exceeded.

Why Rate Limiting Matters

While authentication controls who can access your resources, rate limiting controls how often they can access them. This distinction is important because even legitimate users or automated systems can overwhelm your server with too many requests.

Common scenarios where rate limiting is essential include:

API endpoints that perform expensive database queries or computations
Login pages vulnerable to brute-force attacks
Public APIs where you want to enforce fair usage policies
Preventing a single client from monopolizing server resources

Without rate limiting, a misbehaving client or malicious actor could degrade performance for all users or even crash your server entirely.

How Rate Limiting Works in NGINX

NGINX implements rate limiting through a two-step process. First, we define a shared memory zone that tracks the request state for each client. This zone keeps count of how many requests each client has made recently. Second, we apply this rate limit to specific locations in our configuration.

The beauty of this approach lies in its efficiency: NGINX uses shared memory to track request counts across all worker processes, ensuring consistent enforcement regardless of which worker handles a particular request. The rate limit persists in memory, allowing NGINX to make instant decisions about whether to accept or reject incoming requests.

Defining a Rate Limit Zone

Let's begin by establishing the foundation for rate limiting. We define a shared memory zone in the http context using the limit_req_zone directive:

This single line creates a rate-limiting zone named api_limit. The zone allocates 10 megabytes of shared memory and sets a maximum rate of 10 requests per second per client. Note that this directive appears in the http block, making it available to all server configurations within.

The rate=10r/s parameter defines the maximum allowed rate (10 requests per second) for each client tracked in this zone. However, this is just the definition and tracking setup. The limit is only enforced when you use limit_req in a location. Think of rate=10r/s as setting the speed limit sign, but nothing happens until you put a traffic cop (limit_req) on the road to actually enforce it!

Understanding the Zone Key

The $binary_remote_addr variable serves as our tracking key, representing each client's IP address in binary format:

Using the binary representation instead of the string form ($remote_addr) provides two advantages:

It consumes less memory: a binary IPv4 address requires only 4 bytes compared to up to 15 characters for the string representation.
Binary comparisons execute faster than string comparisons, improving lookup efficiency when NGINX checks request rates.

The 10 megabytes we've allocated can track approximately 160,000 unique IP addresses simultaneously, which is sufficient for most applications.

Setting Up the Server Structure

Now we'll create our server configuration with both a public root location and a protected API endpoint. This time, we'll use try_files to route API requests to a named location, which will serve the actual content:

Here’s what each part does:

The root location / remains unrestricted, returning "OK" to all requests.
The /api/ location is where we will apply our rate-limiting rules. It uses try_files "" @api; to immediately route requests to the named location @api.
The named location @api is responsible for generating the response "api".

This structure is important for rate limiting: limit_req runs in the access phase, and we want it to be evaluated in the /api/ location before the request is passed to @api for content.

Applying the Rate Limit

To enforce our rate limit on the /api/ path, we use the limit_req directive within that location block:

The zone=api_limit parameter references the shared memory zone we defined earlier. This connection activates rate limiting for this location, instructing NGINX to check each incoming request against our 10 requests per second limit.

Because we are not using return directly in this location, the normal request processing flow continues: the access phase (where limit_req runs) is executed first, and only then does try_files pass the request to @api to generate the response.

Understanding Burst and Nodelay

The burst and nodelay parameters fine-tune how NGINX handles traffic spikes:

Real-world traffic rarely arrives at perfectly even intervals. The burst=20 parameter allows up to 20 requests to exceed the base rate temporarily, accommodating legitimate traffic spikes without rejecting them immediately.

Why Use Burst Instead of a Higher Base Rate?

You might wonder: if we want to allow up to 30 requests in a second, why not simply set rate=30r/s instead of rate=10r/s with burst=20? The key difference lies in what happens over time.

With rate=30r/s and no burst, every client can send up to 30 requests per second continuously—all the time. This allows a high, steady flow that could strain your server if many clients maintain this rate simultaneously.

In contrast, rate=10r/s with burst=20 allows clients to send up to 30 requests in a single second occasionally—not constantly. If a client keeps sending 30 requests every second, only 10 per second will be allowed after the burst capacity is exhausted. The burst acts as a buffer for legitimate traffic spikes while enforcing a lower average rate over time.

This distinction makes burst particularly valuable for protecting resource-intensive endpoints. You can accommodate legitimate spikes (like a user rapidly clicking through multiple pages) while preventing sustained high-volume traffic that could overwhelm your server.

Processing Burst Requests

The nodelay parameter determines how these burst requests are processed. With nodelay present, NGINX handles burst requests immediately rather than queuing them. This means:

Requests 1 through 10 are processed instantly (within the base rate)
Requests 11 through 30 are also processed immediately (using the burst allowance)
Request 31 and beyond are rejected with an error response

Without nodelay, excess requests would be queued and processed at the base rate, which could introduce unwanted latency.

Adding Diagnostic Headers

We can enhance our configuration by including a custom header that identifies which rate limit zone handled the request:

The X-RateLimit-Zone header appears in every response from this path, making it easier to debug and monitor rate-limiting behavior. The always parameter ensures this header is included regardless of the response status code, even when requests are rejected.

Observing Normal Behavior

When clients respect the rate limit, they receive successful responses. Let's examine what happens when we make requests within the allowed rate:

Each successful request returns a 200 status code with our custom header identifying the api_limit zone. The response body contains "api" as specified in the @api location. As long as clients stay within 10 requests per second (plus the burst allowance), this is what they'll see.

Handling Rate Limit Violations

When a client exceeds both the base rate and burst capacity, NGINX automatically protects your server by rejecting the request:

The 503 status code signals that the service is temporarily unavailable due to rate limiting. Notice that our custom X-RateLimit-Zone header still appears, helping identify which rate limit was triggered. This automatic rejection happens without any additional code or configuration; NGINX handles it entirely.

The Complete Configuration

Here's our full configuration showing how all components work together:

In this configuration:

The root path / remains open for general traffic.
The /api/ endpoint is protected by rate limiting via limit_req in that location.
try_files "" @api; forwards traffic from /api/ to the named location @api, which generates the "api" response.
Because we avoid an immediate return in the /api/ location, NGINX runs the access phase (including limit_req) before content is served.

This selective protection pattern allows you to secure high-value or resource-intensive endpoints without impacting the rest of your application.

Conclusion and Next Steps

You've successfully learned how to implement rate limiting in NGINX to protect your API endpoints from excessive traffic, using try_files to ensure rate limiting is applied before content is served. You now understand how the limit_req_zone directive establishes tracking zones, how the limit_req directive applies limits to specific locations, and how the burst and nodelay parameters control traffic spike behavior.

Rate limiting complements the authentication techniques you learned previously, adding another layer of defense to your security strategy. Together, these mechanisms help ensure your server remains responsive and available to legitimate users while automatically handling abuse.

In the upcoming exercises, you'll implement rate-limiting configurations yourself, experimenting with different rates, burst values, and protected endpoints to solidify your understanding of this powerful security feature.

Previous Lesson

Next Lesson: Blocking Malicious Traffic

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal