Welcome to the first lesson of "The MLP Architecture: Activations & Initialization"! I'm excited to continue our neural network journey with you. In our previous course, Neural Network Fundamentals: Neurons and Layers, we built the foundations of neural networks by implementing individual neurons, adding activation functions, and combining neurons into a single DenseLayer
capable of forward propagation.
Today, we're taking a significant step forward by learning how to stack multiple layers together to create a Multi-Layer Perceptron (MLP). MLPs are the fundamental architecture behind many neural network applications and represent the point where our implementations truly become "deep learning."
By the end of this lesson, you'll have created a fully functional MLP
capable of processing data through multiple layers, bringing us much closer to solving real-world problems. Let's dive in!
Before we dive into Multi-Layer Perceptrons, let's quickly refresh the core components we built in our previous course. Our foundation consists of two key elements:
-
The sigmoid activation function, which transforms linear inputs into non-linear outputs between 0 and 1:
-
The DenseLayer class, which represents a fully connected layer of neurons:
Our DenseLayer
performs three essential operations:
- Initializes weights and biases (note how we're currently using
math.random * 0.1
for weights — we'll explore why we do it as well as better initialization strategies later in this course). - Stores layer dimensions and activation function.
- Performs the forward pass by computing the weighted sum and applying activation.
This single layer is powerful, but the real magic happens when we combine multiple layers together — which is exactly what we'll do today by building our Multi-Layer Perceptron!
Before we start coding, let's understand what a Multi-Layer Perceptron is and why it's so powerful.
A Multi-Layer Perceptron is a neural network architecture consisting of multiple dense layers stacked sequentially. It typically has:
- An input layer that receives the raw data;
- One or more hidden layers that perform intermediate computations;
- An output layer that produces the final result.
The power of MLPs comes from this layered structure. Each layer can learn increasingly complex representations of the data:
- The first layer might detect simple patterns;
- Middle layers combine these into more complex features;
- The final layers use these features to make sophisticated decisions.
Information flows through an MLP in one direction: forward from input to output. This is why MLPs are also called feedforward neural networks.
Think of each layer as performing a specific transformation on the data, with the output of one layer becoming the input to the next. This hierarchical structure allows MLPs to learn complex mappings between inputs and outputs that would be impossible with just a single layer.
Now that we understand the concept, let's start implementing our MLP
. First, we'll create the basic class structure that will house our layers:
This simple initialization creates an empty array that will store our layers. The key idea here is that our MLP
will be a container for multiple DenseLayer
objects arranged in sequence.
Notice how we're deliberately keeping the initialization straightforward. The MLP
doesn't need to know in advance how many layers it will contain or their dimensions — this flexibility lets us dynamically build networks of different architectures as needed. This design approach mirrors professional deep learning frameworks, which also allow for flexible network construction.
Next, we need a way to add layers to our MLP
. Let's implement the addLayer
method:
This method is elegantly simple — it takes a layer object (which will be an instance of our previously created DenseLayer
class) and appends it to our layers
array.
The beauty of this approach is its flexibility:
- We can add as many layers as we need.
- Each layer can have different numbers of neurons.
- We could potentially extend this to support different types of layers in the future.
When using this method, we'll need to ensure that the dimensions of consecutive layers match correctly — the number of outputs from one layer must equal the number of inputs to the next layer. This dimensional compatibility is essential for data to flow properly through the network.
Now for the most crucial part: implementing forward propagation through all the layers in our MLP
. This is where we'll see how the output of one layer becomes the input to the next:
Let's break down what happens here:
- We initialize
currentInput
with the original input data. - We iterate through each layer in our network.
- For each layer, we:
- Call the layer's
forward
method with the current input. - Update
currentInput
with the output from that layer.
- Call the layer's
- After processing through all layers, we return the final output.
This sequential processing is the essence of how information flows through an MLP
. Each layer transforms the data, gradually shaping it into the desired output. The variable currentInput
serves as the "baton" in this relay race, carrying information from one layer to the next.
The elegance of this approach is that the MLP
doesn't need to know the internal details of each layer — it simply calls the forward
method, trusting each layer to do its job correctly. This encapsulation is a powerful software design principle that allows us to build complex systems from simpler components.
Now that we have our MLP
class defined, let's see how to create a complete multi-layer perceptron with multiple dense layers. We'll use math.matrix
to ensure all data is handled as Math.js matrices, which is best practice for consistency and performance:
In this code, we:
- Create a sample input
X_sample
as a Math.js matrix with 4 features (a single sample for now). - Instantiate our
MLP
. - Add three layers:
- The first layer takes 4 inputs (matching our input data) and produces 5 outputs.
- The second layer takes those 5 inputs and produces 3 outputs.
- The final layer takes 3 inputs and produces a single output.
- Print information about our constructed network.
Notice how we've chained the layers together, ensuring that the number of inputs to each layer matches the number of outputs from the previous layer. This forms a coherent network where data can flow smoothly from input to output.
The output shows:
This gives us a clear picture of our network's architecture — a 3-layer MLP
with a decreasing number of neurons in each layer, funneling down to a single output neuron.
Now let's run our input data through the MLP
and examine the output:
In this code:
- We perform a forward pass with our single sample input and print the result, using
.size()
and.valueOf()
to get matrix dimensions and values. - We create a batch of 2 samples, each with 4 features, as a Math.js matrix.
- We run a forward pass with the batch and print the result.
The output shows:
Several important observations:
- Our single sample input produced a single scalar output (wrapped in a 2D array to maintain batch structure).
- Our batch of 2 samples produced 2 outputs — one for each sample.
- The output values are different for each sample, showing that our network processes each sample individually.
- All outputs are in the range (0, 1) because we're using the sigmoid activation function in all layers.
This confirms that our MLP
is working correctly! It can process both individual samples and batches of data, maintaining the correct output dimensions throughout the network.
Congratulations! You've successfully built a Multi-Layer Perceptron from scratch using your previously created DenseLayer
class. This is a major milestone in your neural network journey. We've explored how MLPs stack multiple layers sequentially, with each layer transforming inputs and passing results to the next. You've learned to create networks of different architectures by varying the number and size of layers, and your implementation now efficiently handles both individual samples and batches of data.
In the practices that follow, you'll have the opportunity to practice building your own MLP
and experiment with it. Following that, we'll explore various activation functions beyond sigmoid and learn why they're crucial for neural network performance. We'll also implement these different activations into our MLP
framework, giving you more flexibility in designing networks suited to different types of problems. Your journey into deep learning is just beginning!
