Hello, and welcome to today's exciting lesson! We will delve into the world of neural networks, focusing on a technique called forward propagation, or the data flow from input to output in a neural network.
Neural networks are a variety of machine learning models inspired by the human brain. They draw upon the idea of having neurons interconnected in a net-like structure to process and learn from information, similar to how our brain learns from the data fed into it by our senses. One basic and essential step in how a neural network processes and learns from information is termed as forward propagation
.
As the name suggests, forward propagation
involves moving forward through the network. Each node in the network gets inputs from the nodes in the previous layer, multiplies them with their weights, adds a bias, and then "fires" that through an activation function. The result is then passed on as input to the nodes in the next layer. This process is repeated layer after layer until we reach the output layer, giving us the predicted output.
But what if the predicted output is far from the actual result? That's when backpropagation comes into play. In simple terms, backpropagation is the method used to update the weights of our neural network based on error correction. The less the error, the better our model predictions.
The entity that quantifies the error between predicted and actual outputs is the loss function. To minimize this loss and hence the prediction error, we use optimization algorithms like gradient descent. In this lesson, we focus on understanding forward propagation, setting a solid foundation for learning more intricate neural network operations such as backpropagation in future lessons.
Now, let's get our hands on practical implementation. We'll use the Iris dataset for our demonstration:
We start by loading the Iris dataset, which is a multivariate dataset introduced by the British statistician and biologist Ronald Fisher in 1936. It's a go-to dataset for any beginner because of its simplicity. It includes the sepal length, sepal width, petal length, and petal width of 150 iris flowers from three different species. However, for this task, we only consider the first two features, which are sepal length and sepal width. And, we convert the problem into a binary classification problem — by taking the two classes.
Next up, we should preprocess our data accordingly:
In machine learning, scaling the inputs is a common practice. It helps our model to converge faster. So, we scale our inputs to be in the (0, 1) range using MinMaxScaler()
. After that, we split our dataset into a training set (80% of the data) and a test set (20% of the data). The model will learn from the training data and evaluate its performance on the unseen test data.
With our data ready, let's define our simple neural network architecture:
Our neural network consists of one input layer, one hidden layer, and one output layer. The input layer has neurons equivalent to our input data features, the hidden layer consists of 5 neurons, and the output layer just has one neuron as we are dealing with binary classification.
Now, let's define the configuration of our model:
We begin by initializing a linear stack of layers using the Sequential()
class. We add the input layer using the Input()
function and specify the shape of our input data. Next, we add a hidden layer with 5 neurons using the Dense()
function, and we use a relu
activation function. Finally, we add our output layer, which also uses a sigmoid
activation function as it's a binary classification task.
Let’s take a moment to understand the role of the activation function in a neural network.
An activation function is a function that takes the output of a neuron (input data dot weights, plus bias) and produces a resultant output that is then used as input for the next layer in the network. The activation function introduces non-linearity into the neuron's output. The Sigmoid activation function, used in our model, compresses outputs of each neuron to a range between 0 and 1. It's especially useful for models where we have to predict the probability as an output.
Forward propagation
involves flowing forward in the neural network. It takes the input data, performs dot product with weights, adds the bias, applies the activation function and passes the result as the input to the next layer. We repeat this until we get our predicted result at the final output layer.
This entire forward propagation operation is handled behind the scenes when you call the fit
function on your model, as through:
By calling the compile()
function, we configure the learning process. We use 'adam' as our optimizer (which stands for stochastic gradient descent), and we use 'binary_crossentropy' as our loss function since it's a binary classification problem.
When we fit the model, forward propagation occurs in every epoch of training.
The computed output from forward propagation is compared with the actual output to determine the error/cost. The similarity between the predicted and the actual output is captured by the cost function. The goal in training our model is to find the best set of weights and biases that minimizes the cost function:
This value represents the model's loss at the last epoch, indicating how well the model performed during training.
After understanding the key concepts behind the operations of a neural network and their implementation in our model, let's see how our model performs on unseen data:
Here with model.predict
we get the predicted output probabilities, and we convert them to binary values using a threshold of 0.5. We then calculate the accuracy of our model by comparing the predicted output with the actual output - the predicted output is the class with the highest probability.
This demonstrates the accuracy of our model on the test data, showing the proportion of correct predictions made by the model.
Finally, let's visualize the decision boundary learned by our model. For that we need to create a mesh grid which covers the entire range of our data, and then predict the output for each point in the mesh grid:
After predicting the output for each point in the mesh grid, we plot the decision boundary learned by our model:
The code above plots the decision boundary learned by our model. The region where the predicted output is greater than 0.5 is considered as one class, and the region where the predicted output is less than 0.5 is considered as another class and they are represented by different colors.
The plot shows the decision boundary learned by our model. The blue region represents the class with values less than 0.5, and the green region represents the class with values greater than 0.5. The points represent the actual data points, with the color indicating the class they belong to.
That concludes our journey of understanding the basics of the operations within a neural network, focusing on forward propagation and the calculation of the cost function. We used TensorFlow to build a simple neural network, making data processing and forward propagation a smoother and more efficient process.
Remember, practical tasks build exceptional skills. In the next lesson, we will tackle more practical tasks designed to test and enhance your understanding of forward propagation, giving you an edge in your machine learning journey. Keep learning and keep improving!
