Welcome to the first lesson of "The MLP Architecture: Activations & Initialization"! I'm excited to continue our neural network journey with you. In our previous course, neural network fundamentals: neurons and layers, we built the foundations of neural networks by implementing individual neurons, adding activation functions, and combining neurons into a single DenseLayer capable of forward propagation.
Today, we're taking a significant step forward by learning how to stack multiple layers together to create a multi-layer perceptron (MLP). MLPs are the fundamental architecture behind many neural network applications and represent the point where our implementations truly become "deep learning."
By the end of this lesson, you'll have created a fully functional MLP capable of processing data through multiple layers, bringing us much closer to solving real-world problems. Let's dive in!
Before we dive into multi-layer perceptrons, let's quickly refresh the core components we built in our previous course. Our foundation consists of two key elements:
- The sigmoid activation function, which transforms linear inputs into non-linear outputs between
0and1. - The DenseLayer, which creates a fully connected layer of neurons.
A note on implementation: In our previous course, we implemented DenseLayer as a function that returned a list of functions. While that approach worked well, in this course we'll transition to using R6 classes for our neural network components. R6 is R's object-oriented programming system that provides cleaner syntax for creating objects with methods and mutable state — features that become increasingly valuable as our networks grow more complex. This approach will make our code more organized and easier to extend as we build more sophisticated architectures.
Here's our DenseLayer implemented as an R6 class:
Before we start coding, let's understand what a multi-layer perceptron is and why it's so powerful.
A multi-layer perceptron is a neural network architecture consisting of multiple dense layers stacked sequentially. It typically has:
- An input layer that receives the raw data.
- One or more hidden layers that perform intermediate computations.
- An output layer that produces the final result.
The power of MLPs comes from this layered structure. Each layer can learn increasingly complex representations of the data:
- The first layer might detect simple patterns.
- Middle layers combine these into more complex features.
- The final layers use these features to make sophisticated decisions.
Information flows through an MLP in one direction: forward from input to output. This is why MLPs are also called feedforward neural networks.
Think of each layer as performing a specific transformation on the data, with the output of one layer becoming the input to the next. This hierarchical structure allows MLPs to learn complex mappings between inputs and outputs that would be impossible with just a single layer.
Now that we understand the concept, let's start implementing our MLP R6 class. First, we'll create the basic class structure that will create our MLP objects.
This simple initialization creates an R6 class with a layers field that will store our layers as a list. The key idea here is that our MLP will be a container for multiple DenseLayer objects arranged in sequence.
Notice how we're deliberately keeping the initialization straightforward. The MLP doesn't need to know in advance how many layers it will contain or their dimensions — this flexibility lets us dynamically build networks of different architectures as needed. This design approach mirrors professional deep learning frameworks, which also allow for flexible network construction.
Next, we need a way to add layers to our MLP. Let's implement the add_layer method.
This method takes a layer object (which will be an instance created by our DenseLayer R6 class) and appends it to our layers list using R's append() function. We've added two important safety checks:
- We verify that only
DenseLayerobjects are added to maintain type consistency. - We validate dimensional compatibility between consecutive layers — if we're adding a layer after existing layers, we check that the previous layer's output dimension (
n_neurons) matches the new layer's input dimension (n_inputs). If they don't match, we provide a clear error message indicating the mismatch.
This dimensional validation is crucial because it catches configuration errors immediately when building the network, rather than waiting until we try to run data through it. The error message explicitly states what went wrong, making it easy to debug network architecture issues.
Now for the most crucial part: implementing forward propagation through all the layers in our MLP. This is where we'll see how the output of one layer becomes the input to the next.
Let's break down what happens here:
- We initialize
current_inputwith the original input data. - We iterate through each layer in our network.
- For each layer, we:
- Call the layer's
forwardmethod with the current input. - Update
current_inputwith the output from that layer.
- Call the layer's
- After processing through all layers, we return the final output.
This sequential processing is the essence of how information flows through an MLP. Each layer transforms the data, gradually shaping it into the desired output. The variable current_input serves as the "baton" in this relay race, carrying information from one layer to the next.
The elegance of this approach is that the MLP doesn't need to know the internal details of each layer — it simply calls the method, trusting each layer to do its job correctly. This is a powerful software design principle that allows us to build complex systems from simpler components.
Now that we have our MLP class defined, let's see how to create a complete multi-layer perceptron with multiple dense layers.
In this code, we:
- Create a sample input
X_samplewith4features (a single sample for now). - Instantiate our
MLPusingMLP$new(). - Add three layers using
DenseLayer$new():- The first layer takes
4inputs (matching our input data) and produces5outputs. - The second layer takes those
5inputs and produces3outputs. - The final layer takes
3inputs and produces a single output.
- The first layer takes
Now let's run our input data through the MLP and examine the output.
In this code:
- We perform a forward pass with our single sample input and print the result.
- We create a batch of
2samples, each with4features, usingbyrow = TRUEfor proper row arrangement. - We run a forward pass with the batch and print the result.
The output shows:
Several important observations:
- Our single sample input produced a single scalar output (in a 2D matrix to maintain batch structure).
- Our batch of
2samples produced2outputs — one for each sample. - The output values are different for each sample, showing that our network processes each sample individually.
- All outputs are in the range (
0,1) because we're using the in all layers.
Congratulations! You've successfully built a multi-layer perceptron from scratch using your previously created DenseLayer R6 class. This is a major milestone in your neural network journey. We've explored how MLPs stack multiple layers sequentially, with each layer transforming inputs and passing results to the next. You've learned to create networks of different architectures by varying the number and size of layers, and your implementation now efficiently handles both individual samples and batches of data.
In the practices that follow, you'll have the opportunity to practice building your own MLP and experiment with it. Following that, we'll explore various activation functions beyond sigmoid and learn why they're crucial for neural network performance. We'll also implement these different activations into our MLP framework, giving you more flexibility in designing networks suited to different types of problems. Your journey into deep learning is just beginning!
