The MLP Architecture: Activations & Initialization

Introduction

Welcome to the first lesson of "The MLP Architecture: Activations & Initialization"! I'm excited to continue our neural network journey with you. In our previous course, neural network fundamentals: neurons and layers, we built the foundations of neural networks by implementing individual neurons, adding activation functions, and combining neurons into a single DenseLayer capable of forward propagation. Today, we're taking a significant step forward by learning how to stack multiple layers together to create a multi-layer perceptron (MLP) . MLPs are the fundamental architecture behind many neural network applications and represent the point where our implementations truly become "deep learning." By the end of this lesson, you'll have created a fully functional MLP capable of processing data through multiple layers, bringing us much closer to solving real-world problems. Let's dive in!

Recap: Our Neural Network Building Blocks

Before we dive into multi-layer perceptrons, let's quickly refresh the core components we built in our previous course. Our foundation consists of two key elements: The sigmoid activation function, which transforms linear inputs into non-linear outputs between 0 and 1. Rsigmoid <- function(x) { return(1 / (1 + exp(-x))) } The DenseLayer, which creates a fully connected layer of neurons. A note on implementation: In our previous course, we implemented DenseLayer as a function that returned a list of functions. While that approach worked well, in this course we'll transition to using R6 classes for our neural network components. R6 is R's object-oriented programming system that provides cleaner syntax for creating objects with methods and mutable state — features that become increasingly valuable as our networks grow more complex. This approach will make our code more organized and easier to extend as we build more sophisticated architectures. Here's our DenseLayer implemented as an R6 class: Rlibrary(R6) DenseLayer <- R6Class("DenseLayer", public = list( # Public fields weights = NULL, biases = NULL, n_inputs = NULL, n_neurons = NULL, output = NULL, activation_fn = NULL, activation_fn_name = NULL, # Constructor initialize = function(n_inputs, n_neurons) { self$n_inputs <- n_inputs self$n_neurons <- n_neurons # Weights: (n_inputs, n_neurons), Biases: (1, n_neurons) self$weights <- matrix(runif(n_inputs * n_neurons, 0, 0.1), nrow = n_inputs, ncol = n_neurons) self$biases <- matrix(0, nrow = 1, ncol = n_neurons) self$output <- NULL # For now, DenseLayer defaults to Sigmoid activation self$activation_fn <- sigmoid self$activation_fn_name <- "sigmoid" }, # Forward method forward = function(inputs) { # Perform a forward pass through the dense layer weighted_sum <- inputs %*% self$weights + rep(self$biases, each = nrow(inputs)) self$output <- self$activation_fn(weighted_sum) return(self$output) } ) ) Our DenseLayer performs three essential operations: Initializes weights and biases (note how we're currently using runif(0, 0.1) for weights — we'll explore why we do it as well as better initialization strategies later in this course). Stores layer dimensions and activation function. Performs the forward pass by computing the weighted sum with proper bias broadcasting and applying activation. This single layer is powerful, but the real magic happens when we combine multiple layers together — which is exactly what we'll do today by building our multi-layer perceptron!

Understanding Multi-Layer Perceptrons

Before we start coding, let's understand what a multi-layer perceptron is and why it's so powerful. A multi-layer perceptron is a neural network architecture consisting of multiple dense layers stacked sequentially. It typically has: An input layer that receives the raw data. One or more hidden layers that perform intermediate computations. An output layer that produces the final result. The power of MLPs comes from this layered structure. Each layer can learn increasingly complex representations of the data: The first layer might detect simple patterns. Middle layers combine these into more complex features. The final layers use these features to make sophisticated decisions. Information flows through an MLP in one direction: forward from input to output. This is why MLPs are also called feedforward neural networks . Think of each layer as performing a specific transformation on the data, with the output of one layer becoming the input to the next. This hierarchical structure allows MLPs to learn complex mappings between inputs and outputs that would be impossible with just a single layer.

Creating the MLP Class

Now that we understand the concept, let's start implementing our MLP R6 class. First, we'll create the basic class structure that will create our MLP objects. R MLP <- R6Class("MLP", public = list(# Public fields layers = NULL, # Constructor initialize = function() {self$layers <- list()})) MLP <- R6Class("MLP", public = list(# Public fields layers = NULL, # Constructor initialize = function() {self$layers <- list()})) This simple initialization creates an R6 class with a layers field that will store our layers as a list. The key idea here is that our MLP will be a container for multiple DenseLayer objects arranged in sequence. Notice how we're deliberately keeping the initialization straightforward. The MLP doesn't need to know in advance how many layers it will contain or their dimensions — this flexibility lets us dynamically build networks of different architectures as needed. This design approach mirrors professional deep learning frameworks, which also allow for flexible network construction.

Adding Layers to the MLP

Next, we need a way to add layers to our MLP. Let's implement the add_layer method. RMLP <- R6Class("MLP", public = list( # Public fields layers = NULL, # Constructor initialize = function() { self$layers <- list() }, # Add layer method add_layer = function(layer) { # Add a layer to the MLP if (!inherits(layer, "DenseLayer")) { stop("Only DenseLayer objects can be added to MLP") } # Check dimensional compatibility with previous layer if (length(self$layers) > 0) { prev_layer <- self$layers[[length(self$layers)]] if (prev_layer$n_neurons != layer$n_inputs) { stop(sprintf( "Dimension mismatch: Previous layer has %d neurons but new layer expects %d inputs", prev_layer$n_neurons, layer$n_inputs )) } } self$layers <- append(self$layers, layer) invisible(self) } ) ) This method takes a layer object (which will be an instance created by our DenseLayer R6 class) and appends it to our layers list using R's append() function. We've added two important safety checks: We verify that only DenseLayer objects are added to maintain type consistency. We validate dimensional compatibility between consecutive layers — if we're adding a layer after existing layers, we check that the previous layer's output dimension (n_neurons) matches the new layer's input dimension (n_inputs). If they don't match, we provide a clear error message indicating the mismatch. This dimensional validation is crucial because it catches configuration errors immediately when building the network, rather than waiting until we try to run data through it. The error message explicitly states what went wrong, making it easy to debug network architecture issues. The beauty of this approach is its flexibility combined with safety: We can add as many layers as we need. Each layer can have different numbers of neurons. Misconfigurations are caught early with helpful error messages. We could potentially extend this to support different types of layers in the future. We return invisible(self) to enable method chaining if desired.

Forward Propagation Through Multiple Layers

Now for the most crucial part: implementing forward propagation through all the layers in our MLP. This is where we'll see how the output of one layer becomes the input to the next. RMLP <- R6Class("MLP", public = list( # Public fields layers = NULL, # Constructor initialize = function() { self$layers <- list() }, # Add layer method add_layer = function(layer) { # Add a layer to the MLP if (!inherits(layer, "DenseLayer")) { stop("Only DenseLayer objects can be added to MLP") } self$layers <- append(self$layers, layer) invisible(self) }, # Forward method forward = function(inputs) { # Perform a forward pass through all layers in the MLP # inputs: Input data for the first layer current_input <- inputs for (layer in self$layers) { current_input <- layer$forward(current_input) } return(current_input) } ) ) Let's break down what happens here: We initialize current_input with the original input data. We iterate through each layer in our network. For each layer, we: Call the layer's forward method with the current input. Update current_input with the output from that layer. After processing through all layers, we return the final output. This sequential processing is the essence of how information flows through an MLP. Each layer transforms the data, gradually shaping it into the desired output. The variable current_input serves as the "baton" in this relay race, carrying information from one layer to the next. The elegance of this approach is that the MLP doesn't need to know the internal details of each layer — it simply calls the forward method, trusting each layer to do its job correctly. This encapsulation is a powerful software design principle that allows us to build complex systems from simpler components.

Building an MLP Network

Now that we have our MLP class defined, let's see how to create a complete multi-layer perceptron with multiple dense layers. R# Create a sample input X_sample <- matrix(c(1.0, 0.5, -1.0, 2.0), nrow = 1, ncol = 4) # Shape (1, 4) cat("Input X (shape", paste(dim(X_sample), collapse = " x "), "):\n") print(X_sample) # Create the MLP mlp <- MLP$new() mlp$add_layer(DenseLayer$new(n_inputs = 4, n_neurons = 5)) # First layer mlp$add_layer(DenseLayer$new(n_inputs = 5, n_neurons = 3)) # Hidden layer mlp$add_layer(DenseLayer$new(n_inputs = 3, n_neurons = 1)) # Output layer # Print information about the MLP cat("\nMLP created with", length(mlp$layers), "layers.\n") for (i in seq_along(mlp$layers)) { layer <- mlp$layers[[i]] cat(" Layer", i, ":", layer$n_inputs, "inputs,", layer$n_neurons, "neurons, Activation:", layer$activation_fn_name, "\n") } In this code, we: Create a sample input X_sample with 4 features (a single sample for now). Instantiate our MLP using MLP$new(). Add three layers using DenseLayer$new(): The first layer takes 4 inputs (matching our input data) and produces 5 outputs. The second layer takes those 5 inputs and produces 3 outputs. The final layer takes 3 inputs and produces a single output. Print information about our constructed network using seq_along() for safe iteration. Notice how we've chained the layers together, ensuring that the number of inputs to each layer matches the number of outputs from the previous layer. This forms a coherent network where data can flow smoothly from input to output. The output shows: Input X (shape 1 x 4 ): [,1] [,2] [,3] [,4] [1,] 1 0.5 -1 2 MLP created with 3 layers. Layer 1 : 4 inputs, 5 neurons, Activation: sigmoid Layer 2 : 5 inputs, 3 neurons, Activation: sigmoid Layer 3 : 3 inputs, 1 neurons, Activation: sigmoid This gives us a clear picture of our network's architecture — a 3-layer MLP with a decreasing number of neurons in each layer, funneling down to a single output neuron.

Processing Data Through the MLP

Now let's run our input data through the MLP and examine the output. R# Perform forward pass through all layers output <- mlp$forward(X_sample) cat("\nOutput of the MLP (shape", paste(dim(output), collapse = " x "), "):\n") print(output) # Create a batch of inputs X_batch <- matrix(c( 1.0, 0.5, -1.0, 2.0, # First sample 0.1, -0.2, 0.3, -0.4 # Second sample ), nrow = 2, ncol = 4, byrow = TRUE) # Shape (2, 4) cat("\nInput Batch X (shape", paste(dim(X_batch), collapse = " x "), "):\n") print(X_batch) # Process the batch output_batch <- mlp$forward(X_batch) cat("\nOutput of the MLP for batch (shape", paste(dim(output_batch), collapse = " x "), "):\n") print(output_batch)# Perform forward pass through all layers output <- mlp$forward(X_sample) cat("\nOutput of the MLP (shape", paste(dim(output), collapse = " x "), "):\n") print(output) # Create a batch of inputs X_batch <- matrix(c( 1.0, 0.5, -1.0, 2.0, # First sample 0.1, -0.2, 0.3, -0.4 # Second sample ), nrow = 2, ncol = 4, byrow = TRUE) # Shape (2, 4) cat("\nInput Batch X (shape", paste(dim(X_batch), collapse = " x "), "):\n") print(X_batch) # Process the batch output_batch <- mlp$forward(X_batch) cat("\nOutput of the MLP for batch (shape", paste(dim(output_batch), collapse = " x "), "):\n") print(output_batch) In this code: We perform a forward pass with our single sample input and print the result. We create a batch of 2 samples, each with 4 features, using byrow = TRUE for proper row arrangement. We run a forward pass with the batch and print the result. The output shows: Output of the MLP (shape 1 x 1 ): [,1] [1,] 0.5157129 Input Batch X (shape 2 x 4 ): [,1] [,2] [,3] [,4] [1,] 1.0 0.5 -1.0 2.0 [2,] 0.1 -0.2 0.3 -0.4 Output of the MLP for batch (shape 2 x 1 ): [,1] [1,] 0.5157129 [2,] 0.5156310Output of the MLP (shape 1 x 1 ): [,1] [1,] 0.5157129 Input Batch X (shape 2 x 4 ): [,1] [,2] [,3] [,4] [1,] 1.0 0.5 -1.0 2.0 [2,] 0.1 -0.2 0.3 -0.4 Output of the MLP for batch (shape 2 x 1 ): [,1] [1,] 0.5157129 [2,] 0.5156310 Several important observations: Our single sample input produced a single scalar output (in a 2D matrix to maintain batch structure). Our batch of 2 samples produced 2 outputs — one for each sample. The output values are different for each sample, showing that our network processes each sample individually. All outputs are in the range (0, 1) because we're using the sigmoid activation function in all layers. This confirms that our MLP is working correctly! It can process both individual samples and batches of data, maintaining the correct output dimensions throughout the network.

Conclusion and Next Steps

Congratulations! You've successfully built a multi-layer perceptron from scratch using your previously created DenseLayer R6 class. This is a major milestone in your neural network journey. We've explored how MLPs stack multiple layers sequentially, with each layer transforming inputs and passing results to the next. You've learned to create networks of different architectures by varying the number and size of layers, and your implementation now efficiently handles both individual samples and batches of data. In the practices that follow, you'll have the opportunity to practice building your own MLP and experiment with it. Following that, we'll explore various activation functions beyond sigmoid and learn why they're crucial for neural network performance. We'll also implement these different activations into our MLP framework, giving you more flexibility in designing networks suited to different types of problems. Your journey into deep learning is just beginning!

Next Lesson: ReLU Activation and Flexible Layer Design in R MLPs

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal