Loading...

Introduction

Welcome to the first lesson of "The MLP Architecture: Activations & Initialization"! I'm excited to continue our neural network journey with you. In our previous course, Neural Network Fundamentals: Neurons and Layers, we built the foundations of neural networks by implementing individual neurons, adding activation functions, and combining neurons into a single DenseLayer capable of forward propagation.

Today, we're taking a significant step forward by learning how to stack multiple layers together to create a Multi-Layer Perceptron (MLP). MLPs are the fundamental architecture behind many neural network applications and represent the point where our implementations truly become "deep learning."

By the end of this lesson, you'll have created a fully functional MLP capable of processing data through multiple layers, bringing us much closer to solving real-world problems. Let's dive in!

Recap: Our Neural Network Building Blocks

Before we dive into Multi-Layer Perceptrons, let's quickly refresh the core components we built in our previous course. Our foundation consists of two key elements:

The sigmoid activation function, which transforms linear inputs into non-linear outputs between 0 and 1:
The DenseLayer class, which represents a fully connected layer of neurons:

Our DenseLayer performs three essential operations:

Initializes weights and biases (note how we're currently using math.random * 0.1 for weights — we'll explore why we do it as well as better initialization strategies later in this course).

Understanding Multi-Layer Perceptrons

Before we start coding, let's understand what a Multi-Layer Perceptron is and why it's so powerful.

A Multi-Layer Perceptron is a neural network architecture consisting of multiple dense layers stacked sequentially. It typically has:

An input layer that receives the raw data;
One or more hidden layers that perform intermediate computations;
An output layer that produces the final result.

The power of MLPs comes from this layered structure. Each layer can learn increasingly complex representations of the data:

The first layer might detect simple patterns;
Middle layers combine these into more complex features;
The final layers use these features to make sophisticated decisions.

MLP Diagram

Information flows through an MLP in one direction: forward from input to output. This is why MLPs are also called feedforward neural networks.

Think of each layer as performing a specific transformation on the data, with the output of one layer becoming the input to the next. This hierarchical structure allows MLPs to learn complex mappings between inputs and outputs that would be impossible with just a single layer.

Creating the MLP Class

Now that we understand the concept, let's start implementing our MLP. First, we'll create the basic class structure that will house our layers:

This simple initialization creates an empty array that will store our layers. The key idea here is that our MLP will be a container for multiple DenseLayer objects arranged in sequence.

Notice how we're deliberately keeping the initialization straightforward. The MLP doesn't need to know in advance how many layers it will contain or their dimensions — this flexibility lets us dynamically build networks of different architectures as needed. This design approach mirrors professional deep learning frameworks, which also allow for flexible network construction.

Adding Layers to the MLP

Next, we need a way to add layers to our MLP. Let's implement the addLayer method:

This method is elegantly simple — it takes a layer object (which will be an instance of our previously created DenseLayer class) and appends it to our layers array.

The beauty of this approach is its flexibility:

We can add as many layers as we need.
Each layer can have different numbers of neurons.
We could potentially extend this to support different types of layers in the future.

When using this method, we'll need to ensure that the dimensions of consecutive layers match correctly — the number of outputs from one layer must equal the number of inputs to the next layer. This dimensional compatibility is essential for data to flow properly through the network.

Forward Propagation Through Multiple Layers

Now for the most crucial part: implementing forward propagation through all the layers in our MLP. This is where we'll see how the output of one layer becomes the input to the next:

Let's break down what happens here:

We initialize currentInput with the original input data.
We iterate through each layer in our network.
For each layer, we:
- Call the layer's forward method with the current input.
- Update currentInput with the output from that layer.
After processing through all layers, we return the final output.

This sequential processing is the essence of how information flows through an MLP. Each layer transforms the data, gradually shaping it into the desired output. The variable currentInput serves as the "baton" in this relay race, carrying information from one layer to the next.

The elegance of this approach is that the MLP doesn't need to know the internal details of each layer — it simply calls the forward method, trusting each layer to do its job correctly. This encapsulation is a powerful software design principle that allows us to build complex systems from simpler components.

Building an MLP Network

Now that we have our MLP class defined, let's see how to create a complete multi-layer perceptron with multiple dense layers. We'll use math.matrix to ensure all data is handled as Math.js matrices, which is best practice for consistency and performance:

In this code, we:

Create a sample input X_sample as a Math.js matrix with 4 features (a single sample for now).
Instantiate our MLP.
Add three layers:
- The first layer takes 4 inputs (matching our input data) and produces 5 outputs.
- The second layer takes those 5 inputs and produces 3 outputs.
- The final layer takes 3 inputs and produces a single output.
Print information about our constructed network.

Notice how we've chained the layers together, ensuring that the number of inputs to each layer matches the number of outputs from the previous layer. This forms a coherent network where data can flow smoothly from input to output.

The output shows:

This gives us a clear picture of our network's architecture — a 3-layer MLP with a decreasing number of neurons in each layer, funneling down to a single output neuron.

Processing Data Through the MLP

Now let's run our input data through the MLP and examine the output:

In this code:

We perform a forward pass with our single sample input and print the result, using .size() and .valueOf() to get matrix dimensions and values.
We create a batch of 2 samples, each with 4 features, as a Math.js matrix.
We run a forward pass with the batch and print the result.

The output shows:

Several important observations:

Our single sample input produced a single scalar output (wrapped in a 2D array to maintain batch structure).
Our batch of 2 samples produced 2 outputs — one for each sample.
The output values are different for each sample, showing that our network processes each sample individually.
All outputs are in the range (0, 1) because we're using the sigmoid activation function in all layers.

This confirms that our MLP is working correctly! It can process both individual samples and batches of data, maintaining the correct output dimensions throughout the network.

Conclusion and Next Steps

Congratulations! You've successfully built a Multi-Layer Perceptron from scratch using your previously created DenseLayer class. This is a major milestone in your neural network journey. We've explored how MLPs stack multiple layers sequentially, with each layer transforming inputs and passing results to the next. You've learned to create networks of different architectures by varying the number and size of layers, and your implementation now efficiently handles both individual samples and batches of data.

In the practices that follow, you'll have the opportunity to practice building your own MLP and experiment with it. Following that, we'll explore various activation functions beyond sigmoid and learn why they're crucial for neural network performance. We'll also implement these different activations into our MLP framework, giving you more flexibility in designing networks suited to different types of problems. Your journey into deep learning is just beginning!

Next Lesson: ReLU Activation and Flexible Activation Functions in MLPs

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal