Introduction to Throttling and Token Bucket

Welcome to the second lesson of our course on Throttling API Requests. In the previous lesson, we discussed the theoretical importance of rate limiting. Now, we will transition into the practical implementation of the token bucket algorithm, a robust and industry-standard method for implementing throttling to protect your resources.

Throttling is not just about preventing malicious attacks; it is essential for maintaining the performance, reliability, and fairness of your API. Without it, a single noisy neighbor—whether a malicious bot or a buggy client script—could monopolize your server's CPU and memory, degrading the experience for all other users. By the end of this lesson, you will understand how to build a mechanism that ensures your server remains responsive, even under heavy load.

Understanding the Token Bucket Algorithm

The token bucket algorithm is a widely used rate limiting strategy that balances strict traffic control with the flexibility to handle sudden bursts of activity.

Imagine an arcade game machine that requires a physical token to play.

  1. The Bucket: You have a bucket representing your API's capacity.
  2. Refill: A distinct process drops a new token into the bucket every second.
  3. Consumption: When a user wants to make a request, they must take a token from the bucket.
  4. Empty Bucket: If the bucket is empty, the user cannot play (the request is delayed or rejected).

This mechanism provides specific advantages over simple fixed-window counters:

  • Burst Tolerance: If the bucket is full, a client can make several requests in rapid succession (a "burst") until the bucket empties.
  • Smooth Rate: Once the burst capacity is exhausted, the client is limited to the steady refill rate of the tokens.
  • Efficiency: It requires very little memory to store the current token count and the timestamp of the last refill.

Note: While Token Bucket is primarily known as a Rate Limiting algorithm, it is the bridge where rate limiting meets Throttling. In system design, the Token Bucket acts as the "controller" that decides when throttling should begin.

However, implementing this in a high-concurrency environment like ASP.NET Core requires careful handling of shared state and threading.

Managing State with the Token Bucket Class

We will begin by creating the core TokenBucket class. This class is responsible for two things: holding the current number of tokens and replenishing them over time.

Because ASP.NET Core handles requests in parallel, this class must be thread-safe. We break the implementation down into the state definitions and the initialization logic.

The parameters here define the policy: capacity controls the burst size, while refillInterval and refillAmount control the sustained rate.

Next, we implement the replenishment logic. We use PeriodicTimer, a modern .NET feature that provides a non-blocking, async-compatible way to handle recurring events, which is far superior to legacy threading timers for this use case.

Intercepting Requests with Middleware

With our logic encapsulated in the TokenBucket class, we need to inject this behavior into the ASP.NET Core request pipeline. We do this using Middleware.

The middleware acts as a gatekeeper. Before the request reaches your controller, the middleware checks the bucket. If a token is available, the request passes through. If not, the middleware initiates a "backoff" strategy to wait for a token.

To activate this, you would instantiate the bucket as a Singleton (since state must be shared across requests) and register the middleware in your Program.cs file.

Implementing Exponential Backoff

A sophisticated throttling system differs from a simple firewall by attempting to salvage the request. Instead of immediately failing with a 429 Too Many Requests error, we can make the client wait briefly to see if a token becomes available.

We use an Exponential Backoff strategy. We wait a short time, check for a token, and if we fail, we wait a longer time. This reduces pressure on the system during high load.

The mathematical formula Math.Pow(2, attempt) * 100 is the key here. It prevents a "thundering herd" problem where all waiting requests retry at the exact same moment.

Handling Real-World Constraints

Implementing this in a production environment introduces complexity regarding resource management. The "Hard Part" is ensuring that we don't leak memory or process requests for clients that have already disconnected.

  1. Client Disconnection: In the code above, notice the usage of context.RequestAborted inside Task.Delay. If a user gets tired of waiting and closes their browser, the CancellationToken fires, throwing an exception that cancels the delay. This frees up the server thread immediately.

  2. Lifecycle Management: The TokenBucket contains a background timer. If the application shuts down, that timer must be disposed of cleanly.

  3. Distributed Systems: The implementation shown here is "in-memory." If you run your API on 5 different servers, each server has its own bucket. In a microservices architecture, you would typically replace the internal _tokens integer with a call to a centralized Redis cache using Lua scripts to ensure atomic operations across the cluster.

By addressing these constraints, you ensure that your throttling mechanism scales reliably without introducing memory leaks or inconsistent states.

Validating Throttling with a C# Client

To truly understand how your throttle performs, it is best to write a small C# console application that simulates a burst of traffic. This allows you to see the 429 errors and the successful requests side-by-side.

Here is a test harness using HttpClient. It launches 20 concurrent requests against an API configured with a capacity of 5 tokens.

This test harness allows you to simulate high-concurrency scenarios that are difficult to replicate manually, ensuring your rate limits hold up under pressure.

Understanding the Output

When you run this test against your throttled API, you will see output similar to the following. Note the timing differences:

  1. Immediate Success (01-05): The first 5 requests consume the initial capacity immediately.
  2. Delayed Success (06-08): These requests found an empty bucket, entered the RetryWithBackoffAsync loop, waited, and eventually found a new token that the PeriodicTimer added.
  3. Failure (09-20): These requests waited through all retry attempts. The incoming request rate was simply too high for the refill rate to catch up, resulting in a 429 Too Many Requests.

This sequence confirms that the middleware is correctly enforcing the policy: allowing bursts, smoothing traffic via backoff, and ultimately capping the load.

Summary and Next Steps

In this lesson, we have moved from theory to a concrete implementation of the Token Bucket algorithm in C#. We explored how to manage state safely with locks, how to use PeriodicTimer for background replenishment, and how to implement middleware that intelligently handles backpressure using exponential backoff.

We also highlighted the importance of testing your throttling logic with concurrent requests to ensure it behaves as expected under load. This implementation provides a solid foundation, but remember that in a large-scale distributed system, this logic often moves from application memory to a shared store like Redis.

In the upcoming practice exercises, you will take this code and modify the refillInterval and capacity parameters to observe how different configurations drastically change the API's behavior.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal