Introduction: The Case for Camera Abstraction

In the previous lesson, you made a significant architectural improvement to your ray tracer by introducing abstraction. You created the hittable interface that defines what all renderable objects share, implemented the sphere class as a concrete example, and built the hittable_list container to manage collections of objects. This transformation made your code dramatically more maintainable and extensible. Adding new shape types became straightforward, and your rendering logic became cleaner and more focused.

However, if you look closely at your current main.cc file, you'll notice that while the scene management is now beautifully abstracted, the camera logic remains scattered throughout the main function. You have variables for viewport dimensions, focal length, camera origin, and coordinate system vectors all declared inline. The ray generation logic is embedded directly in the rendering loop, where you compute the ray direction for each pixel using these scattered variables. This works fine for a simple static camera, but it creates several problems as your ray tracer grows in complexity.

First, the camera parameters are not organized into a cohesive unit. If you want to change the camera's field of view or move it to a different position, you need to hunt through the main function to find and modify the relevant variables. Second, the ray generation logic is tangled with the rendering loop, making it harder to understand what each part of the code is responsible for. Third, if you want to add features like camera movement, different projection types, or multiple cameras rendering the same scene from different angles, you would need to duplicate or significantly complicate the code in main().

The solution follows the same principle you applied to scene objects in the previous lesson: abstraction and encapsulation. In this lesson, you'll create a dedicated camera class that owns all camera-related data and provides a clean interface for ray generation. The camera will become a self-contained ray generator that knows how to convert pixel coordinates into rays shooting into your scene. This abstraction will make your code cleaner, more maintainable, and ready for future enhancements like camera animation or advanced projection models.

Analyzing Our Current Camera Setup

Before we build the camera class, let's carefully examine the camera code from the previous lesson to understand exactly what it does and why. This understanding will guide our design decisions as we encapsulate this logic into a proper class. Here's the relevant section from your current main() function:

This code sets up a simple pinhole camera model, which is the foundation of ray tracing. Let's break down each component to understand its role in the camera system.

The viewport represents the rectangular window through which we view the scene. Think of it as a physical screen positioned in front of the camera. The viewport_height is set to 2.0 units, which is an arbitrary but convenient choice. The actual value doesn't matter much because we're working in a relative coordinate system, but 2.0 is nice because it makes the math clean. The viewport_width is computed by multiplying the height by the aspect ratio, ensuring that the viewport has the same proportions as the final image. If your image is 16:9, your viewport will also be 16:9, which prevents distortion in the rendered scene.

The focal_length represents the distance from the camera origin to the viewport plane. In a pinhole camera model, all rays originate from a single point (the camera origin) and pass through points on the viewport before continuing into the scene. The focal length of 1.0 unit means the viewport is positioned one unit in front of the camera. This value affects the field of view: a smaller focal length creates a wider field of view (like a wide-angle lens), while a larger focal length creates a narrower field of view (like a telephoto lens). The value of 1.0 gives a reasonable, natural-looking perspective.

The origin is the position of the camera in world space. Currently, it's at (0, 0, 0), meaning the camera sits at the world origin. This is the point from which all rays emanate. In a more advanced ray tracer, you would move this point around to position the camera at different locations in your scene.

Designing the Camera Class Interface

Now that we understand what our camera needs to do, we can design a clean interface for the camera class. Good interface design requires thinking carefully about what data the class should own, what operations it should provide, and what should be exposed publicly versus kept as private implementation details.

Let's start by considering what data the camera needs to own. Looking at our current inline camera code, we have several pieces of information: the aspect ratio, the image dimensions, the viewport dimensions, the focal length, the camera origin, and the coordinate system vectors (horizontal, vertical, and lower_left). All of these are essential to the camera's operation, so they should be member variables of the camera class.

However, not all of this data needs to be provided by the user. Some values can be computed from others. For example, if we know the aspect ratio and the image width, we can compute the image height. If we know the viewport height and aspect ratio, we can compute the viewport width. This leads to an important design principle: the camera's constructor should take only the essential parameters that the user wants to control, and the camera should compute everything else internally.

What are the essential parameters? The aspect ratio is fundamental because it determines the proportions of the image. The image width is also essential because it determines the resolution. From these two values, we can compute the image height, and from the aspect ratio, we can compute the viewport width. The viewport height and focal length can have reasonable default values that work well for most scenes. The camera origin and orientation could be parameters, but for now, we'll keep them as fixed defaults (origin at world origin, looking down the negative z-axis) since we haven't yet covered camera positioning and orientation.

This gives us a constructor signature: camera(double aspect_ratio, int image_width). This is clean and simple. The user specifies the two most important parameters, and the camera handles all the internal setup.

Now let's consider what operations the camera should provide. The primary operation is ray generation: given normalized pixel coordinates (u, v), generate the corresponding ray. This will be a method called get_ray(double u, double v) that returns a ray object. This method encapsulates all the logic we currently have inline in the rendering loop.

The camera should also provide access to the image dimensions, because the rendering loop needs to know how many pixels to iterate through. We'll provide two simple accessor methods: and that return the image width and height, respectively. These are read-only accessors that simply return the stored values.

Building Camera Initialization

Now let's implement the camera constructor, which is responsible for setting up all the camera's internal state. The constructor takes the aspect ratio and image width as parameters and computes everything else needed for ray generation. Create a new file called src/camera.h and let's build it step by step.

We'll start with the header guards and includes:

Notice that we've provided default parameter values in the constructor declaration. The default aspect ratio of 16.0/9.0 and default image width of 400 match what we've been using in previous lessons. This means users can create a camera with camera() and get sensible defaults, or they can specify custom values with camera(16.0/9.0, 800) for a higher resolution image. Default parameters are a convenient C++ feature that makes your classes easier to use.

Now let's implement the constructor body. The first step is to store the parameters and compute the image height:

We store the aspect ratio and image width in member variables. The this-> prefix is necessary here because the parameter names match the member variable names, and we need to disambiguate which is which. The this->aspect_ratio refers to the member variable, while aspect_ratio alone would refer to the parameter.

The image height calculation divides the width by the aspect ratio and converts the result to an integer. The std::max(1, ...) ensures that the height is at least 1 pixel, even if the division produces a value less than 1. This prevents degenerate cases where you might accidentally create a zero-height image. For a 400-pixel-wide image with a 16:9 aspect ratio, this computes a height of 225 pixels.

The Ray Generator: Implementing get_ray()

The heart of the camera class is the get_ray(u, v) method: it turns normalized pixel coordinates into a ray. Add this method to the public section of your camera class, right after the accessor methods:

How it works, in brief:

  • u and v are in [0, 1] and select a point on the viewport.
  • horizontal spans the viewport width, vertical spans the height, and lower_left is the bottom-left corner in world space.
  • The viewport point is lower_left + u*horizontal + v*vertical.
  • The ray starts at origin and points to that viewport point: direction = that point minus origin.

Because origin, horizontal, vertical, and lower_left are precomputed in the constructor, get_ray() is tiny and fast. It’s marked const because ray generation doesn’t mutate camera state. The method is resolution- and aspect-ratio-agnostic: the rendering loop maps pixel indices to , and the camera maps to rays using its precomputed vectors.

Refactoring main.cc with the Camera Class

Now that we have a complete camera class, let's refactor the main program to use it. This refactoring will demonstrate how much cleaner and more maintainable your code becomes when camera logic is properly encapsulated. The changes are straightforward, but the impact on code clarity is significant.

First, let's update the includes at the top of src/main.cc. We need to add the camera header:

The ray_color() function remains unchanged from the previous lesson. It takes a ray and a hittable world, tests for intersections, and returns the appropriate color. This function doesn't need to know anything about cameras, which is exactly what we want. The separation of concerns means each component has a clear, focused responsibility.

Now let's look at the refactored main() function. We'll build it up section by section to see how each part changes. First, the world setup:

The world setup is identical to the previous lesson. We create a hittable_list and add two spheres: a small sphere at (0, 0, -1) with radius 0.5, and a large ground sphere at (0, -100.5, -1) with radius 100. This code hasn't changed because we haven't modified how scenes are managed; we've only changed how the camera works.

Now comes the camera setup, which is dramatically simpler than before:

Compare this to the previous lesson's inline camera code. Instead of declaring separate variables for viewport dimensions, focal length, origin, and coordinate system vectors, we simply create a camera object with the desired aspect ratio and image width. The camera constructor handles all the internal setup automatically. We then retrieve the image dimensions using the camera's accessor methods. These dimensions are stored in const variables because they won't change during rendering.

Summary: A Cleaner, More Modular Architecture

In this lesson, you refactored your ray tracer to encapsulate camera logic in a dedicated camera class. The camera now owns all parameters and math needed for ray generation, providing a simple interface: just specify the aspect ratio and image width, and use get_ray(u, v) to generate rays for each pixel. This abstraction makes your main rendering loop much cleaner and easier to maintain, as all camera-related details are hidden inside the class.

By following the same principles of abstraction and encapsulation you used for scene objects, your code is now more modular and extensible. Future enhancements—like camera movement, different projections, or depth of field—can be added by modifying only the camera class, leaving the rest of your code untouched. This separation of concerns results in a more robust and flexible architecture, setting a strong foundation for further development of your ray tracer.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal