The Universal Approximation Theorem (UAT) is a key mathematical concept guiding the functionality of neural networks. Basically, UAT declares that a neural network with just one hidden layer
- a layer between the input and output - containing a finite number of neurons (nodes where computation takes place), can nearly replicate or mimic any sort of continuous function
.
Imagine the role of a hidden layer as a talented ensemble of artists. If you have a picture (a function) that you'd like them to recreate, they can do it with their collective skill set. Each artist (neuron) specializes in a different type of stroke or style, and together, they combine their talents to reproduce the image. To replicate more complex pictures (functions), you might need more artists (neurons) or an artist capable of a broader range of styles (non-linear activation function). However, as the Universal Approximation Theorem insists, they will always be able to recreate the picture to the desired level of accuracy.
Here, the artist's style is analogous to the activation function in a neural network, which is typically a non-linear function that transforms the input they receive. The Universal Approximation Theorem does come with a small caveat - it specifies that the activation function must be a non-constant
, bounded
, and increasing
function.
To implement the concept in code and understand it better, let's explore a simple example:

Thus, with just 10 neurons and the tanh activation function, you can see that our network does a decent job approximating the target function f(x)=x∗sin(x). Of course, more complex functions may require more hidden neurons or additional layers. However, according to the Universal Approximation Theorem, they can still be approximated by a neural network! Here is a visualization of the simulated network architecture.
