Resource Requests and Limits

Introduction: Why Resource Management Matters

Imagine you're running a production Kubernetes cluster with dozens of applications. One day, a developer deploys a new service that has a memory leak. Within minutes, this single container consumes all available memory on its node, causing the operating system to kill random processes to free up space. Suddenly, your database, web servers, and monitoring tools all crash simultaneously. This catastrophic scenario happens more often than you'd think, and it's exactly why resource requests and limits exist.

In this lesson, you'll learn how to configure these settings so that the Kubernetes scheduler places your pods on appropriate nodes and prevents any single container from monopolizing resources and destabilizing the entire system.

What Are Requests and Limits?

When you deploy a pod to Kubernetes, you can specify two types of resource constraints: requests and limits. These work together but serve different purposes. A resource request tells the Kubernetes scheduler the minimum amount of CPU and memory your container needs to function properly. Think of this as making a reservation — when you book a hotel room for two people, the hotel needs to guarantee space for at least two people. The scheduler uses requests to decide which node has enough available resources to run your pod. If no node can satisfy the request, your pod stays in a Pending state until resources become available.

A resource limit, on the other hand, sets the maximum amount of CPU and memory your container is allowed to consume. Continuing the hotel analogy, the limit is like the maximum occupancy of the room — even if you reserved space for two people, the fire code might allow up to four people maximum. Kubernetes enforces limits at runtime: if your container tries to use more memory than its limit, it gets killed and restarted. If it tries to use more CPU than its limit, it gets throttled (slowed down) but not killed. Limits protect your nodes from being overwhelmed by a single misbehaving container.

Understanding Resource Units

Understanding how to measure these resources is crucial. CPU is measured in cores, where one core represents one physical CPU core or one virtual core on a cloud instance. Kubernetes allows you to specify fractional cores using millicores (m). For example, 100m means 0.1 cores or 10% of one CPU core. If you request 500m, you're asking for half a core. You can also use decimal notation: 0.5 is equivalent to 500m. Most applications don't need an entire core, so millicores give you fine-grained control.

Memory is measured in bytes, but writing large numbers like 134217728 bytes is impractical. Instead, Kubernetes supports suffixes that make these values readable. The suffix Mi stands for mebibytes (1 Mi = 1,048,576 bytes), and Gi stands for gibibytes (1 Gi = 1,024 Mi). You might also see M and G for decimal megabytes and gigabytes, but the binary versions (Mi, Gi) are more common in Kubernetes. For example, 128Mi means 128 mebibytes of memory, which is roughly 134 megabytes. When choosing memory values, think about what your application actually needs: a simple web server might need only 128Mi, while a database might need or more.

Configuring Requests and Limits Together

The relationship between requests and limits is important: limits must be greater than or equal to requests. If you set a request of 100m CPU and a limit of 50m, Kubernetes will reject your configuration because it doesn't make logical sense. Typically, you'll set limits somewhat higher than requests to allow for occasional spikes in usage. For example, you might request 100m CPU and limit it to 500m, giving your container room to handle traffic bursts without being immediately throttled.

What if you only specify one of these values? If you set a request but no limit, Kubernetes will schedule your pod based on the request, but the container can potentially consume unlimited resources on that node — risky but sometimes used for trusted applications that need burst capacity. If you set a limit but no request, Kubernetes automatically sets the request equal to the limit. This means the scheduler will reserve the full limit amount on the node, which is wasteful if your container typically uses much less. For most applications, you should specify both requests and limits to have precise control over scheduling and resource usage.

Declaring Resource Requests and Limits in Deployments

Let's build a deployment that includes resource requests and limits. This configuration will ensure our nginx web server gets scheduled appropriately and can't consume excessive resources. We'll start with the basic deployment structure you learned in the previous lesson, then add the resource specifications. Here's the beginning of our deployment:

This should look familiar — we're creating a deployment called web-app-resources that will run two replicas of our application. The selector tells the deployment which pods it manages. Now comes the important part: defining the pod template with resource specifications. Let's add the template section:

So far, this is standard: we're defining a container named nginx using the nginx:1.25 image. Now we'll add the resources section, which is where we declare our requests and limits:

The resources field goes inside the container definition because each container in a pod can have different resource requirements. Under resources, we define two subsections: requests and limits. The section tells the scheduler: "This container needs at least 100 millicores of CPU and 128 mebibytes of memory to run." When the scheduler looks for a node to place this pod, it will only consider nodes that have at least these amounts available.

Applying and Verifying Your Configuration

Let's deploy this resource-managed application to your cluster. Apply the deployment using kubectl:

You should see output confirming the deployment was created:

This command sends your deployment configuration to the Kubernetes API server, which then creates the deployment object. The deployment controller immediately starts working to create the two replica pods you specified. Now, let's verify that the pods were successfully created and are running:

The -l app=web-app flag filters the results to show only pods with the label app=web-app. You should see output like this:

Both pods show STATUS: Running, which means they were successfully scheduled on nodes and started. The READY column shows 1/1, indicating that one out of one container in each pod is ready. If you had seen STATUS: Pending, it might indicate that no node has enough resources to satisfy your requests. In that case, you'll need to either add more nodes to your cluster or reduce your resource requests.

Now let's inspect one of these pods in detail to confirm that our resource requests and limits were actually applied. Copy one of the pod names from the output above (you'll use your actual pod name, not the example shown here) and run:

Checking Resource Usage with kubectl top

Knowing what resources you've requested and limited is important, but understanding what your containers actually use is equally crucial. Kubernetes provides the top command to show real-time resource consumption. Let's start by looking at resource usage across all nodes in your cluster:

This command queries the metrics server (which comes pre-installed on CodeSignal's environment) and shows current resource consumption for each node. You'll see output like:

The output shows that your node is currently using 150m (0.15 cores) of CPU, which represents 7% of the node's total CPU capacity. The memory usage is 1200Mi out of the node's total capacity, representing 30% utilization. These numbers include all pods running on the node, including system pods. If you're running a multi-node cluster, you'll see one row per node, helping you identify if any nodes are overloaded.

Now let's zoom in to see the resource usage of individual pods:

This shows the current CPU and memory consumption for each pod:

Each nginx pod is using approximately 1m CPU (one millicore) and around 10Mi memory. These numbers represent what the containers are currently consuming right now. The top command gives you a snapshot of resource usage at this moment, helping you understand whether your pods are using more or less than you anticipated.

Right-Sizing Resources Based on Usage

Look at the usage numbers carefully and compare them to what we requested: we asked for 100m CPU and 128Mi memory per pod, but our pods are using only 1m CPU and about 10Mi memory. Our pods are using only 1% of the CPU we requested and less than 10% of the memory we requested. This reveals an important insight: we over-provisioned our resources. The scheduler reserved space for 100m CPU per pod, but the pods are barely using any CPU.

Understanding the difference between these three concepts is critical: requests are what the scheduler reserves on the node, limits are the maximum the container can use, and actual usage is what the container is currently consuming. In an ideal world, your actual usage should stay comfortably between your requests and limits. If usage is consistently much lower than requests (like our example), you're wasting cluster capacity. If usage frequently hits the limits, your application might be getting throttled or killed, indicating you should increase the limits.

Let's say you monitored these pods over several days and noticed the CPU usage never exceeded 10m and memory never exceeded 50Mi. You might adjust your resource specifications to be more accurate:

This tuning process — deploy, monitor, adjust — is how you find the right balance between resource efficiency and reliability. You deploy your application with initial estimates, monitor the actual usage over time, and then adjust the requests and limits based on what you observe.

Summary: Building Reliable Applications

You've learned how to configure resource requests and limits to build reliable Kubernetes applications. Requests ensure that the scheduler places your pods on nodes with sufficient capacity, while limits prevent any single container from consuming excessive resources and destabilizing the node. You specified CPU in millicores and memory in mebibytes, applied a deployment with these settings, and verified that Kubernetes honored your specifications.

By using kubectl top, you monitored actual resource usage and learned how to use this data to tune your requests and limits. Combined with the namespace isolation from the previous lesson, you now have the foundational tools to deploy applications safely and reliably. In the practice exercises ahead, you'll experiment with different resource values and observe how Kubernetes responds when containers exceed their limits.

Previous Lesson

Next Lesson: Liveness and Readiness Probes

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal