In the previous lesson, you created a Deployment that manages a single nginx Pod. But what happens when that one Pod can't handle all the incoming traffic? Or what if that Pod crashes and your application goes down for a few seconds while Kubernetes creates a replacement? Running just one Pod creates a single point of failure and limits your application's capacity.
The solution is horizontal scaling — running multiple identical Pods that can share the workload and provide redundancy. If one Pod fails, the others keep serving traffic. If traffic increases, multiple Pods can handle requests in parallel. In this lesson, you'll scale your Deployment from one Pod to three Pods by changing a single number in your YAML file, and you'll watch Kubernetes automatically create and manage those additional Pods for you.
The replicas field in a Deployment specification tells Kubernetes exactly how many identical Pods you want running at all times. When you set replicas: 3, you're declaring your desired state — you want three Pods, no more and no less. Kubernetes takes this declaration seriously and works continuously to make reality match your specification. This is the core of Kubernetes' declarative model: you declare what you want, and Kubernetes figures out how to make it happen.
Here's how the replica system works in practice. The Deployment controller runs in a continuous loop, constantly checking the actual state against your desired state. It asks, "How many Pods matching my selector are currently running?" If the answer is less than the replica count, it creates new Pods. If the answer is more than the replica count (which can happen if you scale down), it deletes excess Pods. If the answer matches exactly, it does nothing and checks again in a few seconds. This continuous reconciliation loop is what keeps your application running reliably.
Let's look at a concrete example. Imagine you have replicas: 3 in your Deployment, and Kubernetes has successfully created three Pods. Now, suppose one of those Pods crashes because of a bug in your application code. Within seconds, the Deployment controller notices that only two Pods are running, but three are desired. It immediately creates a replacement Pod to bring the count back to three. You don't have to do anything — Kubernetes automatically maintains your desired state. This is why Deployments are so powerful: they turn Pod management from a manual chore into an automated process.
The replicas field works the same way whether you're scaling up or down. If you change replicas: 3 to replicas: 5, Kubernetes creates two more Pods. If you change it to replicas: 1, Kubernetes deletes two Pods. The Deployment doesn't care about the direction of change — it only cares about making the actual state match the desired state. This makes scaling your application as simple as editing a number and reapplying your configuration.
Let's take the Deployment you created in the previous lesson and scale it from one Pod to three Pods. Remember, your original deployment.yaml file looked like this:
To scale this Deployment, you only need to change one line. Find the replicas: 1 line in the spec section and change it to replicas: 3. Save this modified version to a new file called deployment-scaled.yaml:
Notice that everything else stays exactly the same — the selector, the template, the container configuration. You're not changing what the Pods look like or how the Deployment finds them. You're only changing how many of them should exist. This is the beauty of the declarative approach: scaling is just a matter of updating your desired state.
When you apply this change, here's what will happen behind the scenes. Kubernetes will compare the new specification with the current state of the Deployment. It will notice that the replica count changed from 1 to 3. The Deployment controller will then instruct the ReplicaSet (remember that intermediate controller from the previous lesson?) to create two additional Pods using the same template. Those new Pods will have the same labels (app=demo and tier=web), the same container configuration (), and the same port settings. They'll just have different automatically generated names to keep them unique.
Now, let's apply the scaled configuration to your cluster. You'll use the same kubectl apply command you used to create the Deployment initially:
You'll see output confirming that the Deployment was updated:
Notice the word "configured" instead of "created." This tells you that kubectl apply recognized the Deployment already exists and updated it with your new specification. This is why we use kubectl apply instead of kubectl create — apply is smart enough to create resources that don't exist and update resources that do exist. You can use the same command whether you're creating something new or modifying something existing.
The moment you run this command, Kubernetes springs into action. The Deployment controller receives the updated specification and immediately compares it to the current state. It sees that one Pod is running but three are desired. Within milliseconds, it begins creating two new Pods from the template. This all happens automatically — you don't need to tell Kubernetes how to create the Pods or where to place them. You just declare what you want, and Kubernetes handles the rest.
Let's verify that the Deployment registered your change by checking its status:
You'll see output showing the updated replica count:
The READY column now shows 3/3, meaning all three desired Pods are ready. The UP-TO-DATE column shows that all three Pods are running the latest template version. The column confirms that all three Pods are ready to serve traffic. If you run this command immediately after applying the change, you might see something like or as Kubernetes is still creating the new Pods. Within a few seconds, though, all three should be ready.
Now, let's watch Kubernetes create and manage these Pods in real time. The kubectl get pods command has a special -w flag that stands for "watch." This flag keeps the command running and shows you updates as they happen:
If you run this command right after applying the scaling change, you'll see something like this:
Let's break down what you're seeing here. The first Pod (x7k2m) is the original Pod that was already running — notice its age is 5 minutes. The next two Pods (9h3kl and p8m4n) are the new ones Kubernetes just created. They start with a STATUS of ContainerCreating, which means Kubernetes is pulling the nginx image and starting the container. The READY column shows 0/1 because the container isn't ready yet. After a few seconds, the status changes to Running and READY becomes . This whole process typically takes just a few seconds, depending on whether the container image needs to be downloaded.
Scaling applications in Kubernetes is as simple as changing the replicas field in your Deployment specification and reapplying the configuration. Kubernetes handles all the complexity of creating new Pods, distributing them across nodes, assigning IP addresses, and monitoring their health. You declare what you want (three Pods), and Kubernetes makes it happen automatically.
The Deployment controller continuously watches the actual state and adjusts it to match your desired state, creating new Pods when needed and removing excess Pods when you scale down. In the upcoming practice exercises, you'll scale Deployments yourself, experiment with different replica counts, and see how Kubernetes maintains your desired state even when Pods fail or get deleted.
