Implementing Delay Throttling

Introduction to Throttling and Delay Throttle Middleware

Welcome to the second lesson of our course on securing your ASP.NET Core Web API. In the previous lesson, we explored rate limiting and its role in controlling request frequency over time. Now, we will delve into throttling, a related but distinct technique that manages server load by controlling the number of concurrent requests your server processes at any given moment.

Understanding throttling is crucial for maintaining API performance and reliability. Unlike rate limiting, which focuses on request frequency over time, throttling ensures your server is not overwhelmed by too many simultaneous requests, preventing resource exhaustion, slow response times, or system failure. By the end of this lesson, you will have built a production-ready throttling solution that includes intelligent queuing, timeout handling, comprehensive logging, and performance monitoring capabilities.

Rate Limiting vs Throttling

In API security and performance optimization, two key techniques are often discussed: rate limiting and throttling. While they work together to protect your API, they address different aspects of request management and serve complementary purposes.

Rate limiting controls the number of requests a client can make within a specific time window (e.g., 100 requests per minute). It restricts total request frequency over time and is typically implemented on a per-client basis using identifiers like API keys, IP addresses, or user accounts. Think of rate limiting as saying: "You can make 100 requests per minute, but not more."

Throttling, on the other hand, manages the concurrency of requests being processed simultaneously by your server. Rather than focusing on which client is making requests or how many they've made recently, throttling is concerned with the server's overall capacity to handle load at any given moment. Think of throttling as saying: "Our server can handle 5 requests at once, so the 6th request must wait."

When your server receives more concurrent requests than it can efficiently handle, throttling mechanisms can employ several strategies. They can queue excess requests and process them sequentially when capacity becomes available. They can delay processing until the server load decreases, preventing resource exhaustion. Or they can reject requests with appropriate status codes when the system is completely overloaded.

The key distinction is temporal: rate limiting looks at request patterns over time (historical behavior), while throttling looks at the current moment's load (real-time capacity). A client might be well within their rate limit but still experience throttling if the server is currently handling many requests from other clients.

Basic Throttle Middleware

In ASP.NET Core, middleware components form a pipeline that processes HTTP requests and responses sequentially. We will create a throttle middleware that uses SemaphoreSlim to manage concurrent requests efficiently.

The SemaphoreSlim class is a lightweight, thread-safe synchronization primitive designed specifically for async/await patterns. Unlike older synchronization mechanisms, SemaphoreSlim works seamlessly with async code and doesn't block threads, making it ideal for ASP.NET Core middleware.

Let's start with the complete basic implementation:

Here's how this works. The _next delegate represents the next middleware in the pipeline. The _semaphore is initialized with a maximum count of 5, limiting concurrent request processing to five at a time. We use a static field to share this semaphore across all instances, ensuring the limit applies globally to all requests.

When a request arrives, WaitAsync() attempts to enter the . If slots are available (fewer than 5 requests being processed), the request proceeds immediately. If all slots are occupied, the request waits asynchronously until a slot becomes available, creating a queue without blocking threads.

Adding Structured Logging

While our basic middleware functions correctly, it operates as a black box. Adding structured logging using ASP.NET Core's built-in logging infrastructure allows us to observe how requests flow through the throttling system.

Let's enhance the constructor and add logging throughout the request lifecycle:

ASP.NET Core's dependency injection system automatically provides the logger. We use structured logging with named parameters (e.g., {CurrentRequests}), which allows log aggregation systems to parse and index these values.

By accessing _semaphore.CurrentCount, we calculate how many requests are currently being processed. We log at three key points: when a request enters (showing queue status), when it starts processing (after potentially waiting), and when it completes.

These logging statements provide critical operational visibility. You can observe patterns like how often requests wait, whether the concurrency limit needs adjustment, and if certain times create consistent queuing.

Implementing Wait Timeout

One significant limitation of our current implementation is that requests could wait indefinitely if the server remains at maximum capacity. We need a mechanism to reject requests that have waited too long.

We'll implement this using a CancellationTokenSource with a timeout:

The MAX_WAIT_TIME_MS constant defines how long a request can wait (1.5 seconds in this example). The CancellationTokenSource automatically triggers cancellation after this timeout. We pass the cancellation token to WaitAsync, which throws an OperationCanceledException if the timeout expires.

Tracking Wait Time

For monitoring and debugging, it's helpful to track how long requests actually wait before processing. We'll capture this via a custom HTTP response header.

Let's add wait time tracking to our implementation:

We capture startTime before attempting to enter the semaphore, then calculate waitTime after successfully entering. The X-Throttle-Wait-Time header exposes this metric to clients, and we include it in our logging.

This header provides transparency to clients about service performance and enables monitoring systems to track throttling metrics over time. Development teams can use this data to optimize the MAX_CONCURRENT setting based on real usage patterns.

Testing Throttling Behavior

To verify our throttling implementation, we need to generate concurrent requests and analyze results. Here's a C# console application that does this:

Summary

In this lesson, we built a production-ready delay throttle middleware for ASP.NET Core. We started with SemaphoreSlim to control concurrent request processing, ensuring our API never handles more simultaneous requests than it can efficiently process. We enhanced this with structured logging, a maximum waiting threshold using cancellation tokens, and wait time tracking via custom response headers. The testing framework demonstrated how to validate throttling behavior under load.

As you move to the practice exercises, experiment with different MAX_CONCURRENT limits to find the optimal balance for your application. Consider combining this delay throttle with rate limiting techniques from the previous lesson for comprehensive API protection. Remember that throttling configuration is an ongoing process of monitoring, analyzing, and adjusting based on actual usage patterns.

Next Lesson: Throttling and Token Bucket

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal