The Math Behind Neural Networks

Hüseyin Eren Arslan

For the past decade artificial intelligence has become a huge part of our lives. Even if we do not use it directly in our daily lives, it is used to analyze, extract and come up with its own decisions. But what does artificial intelligence exactly do, how does a machine learn just from a sample of data? It basically uses certain mathematical functions to extract data and come up with either a decision or a value. In this article I will focus on the concept of Neural Networks, Convolutional Neural Networks and the mathematical functions used.

To understand the concept of neural networks, we must imagine a neural network as a virtual brain, but one that's really good at specific tasks. Let's break it down:

Neurons: Think of them as tiny decision-makers. They take in information, process it, and decide whether to pass a message along.

Layers: Neurons are organized into layers, like different sections of your brain. The input layer receives information, the hidden layers process it, and the output layer gives you the final result.

Weights and Bias: Neurons aren't all equal. Each connection between neurons has a weight, determining how important one neuron's output is to another. Bias helps in making the decision.

Training: Here's the cool part. The network learns, just like you do. It's shown examples, corrects its mistakes, and adjusts those weights and biases to get better over time.

Task Specific: Neural networks can specialize. Some are awesome at recognizing images, others at understanding language, and so on. It's like having different brain regions for different skills.

So, a neural network is essentially a smart system that learns to do specific tasks by mimicking how our brains process information, making it super handy for things like image recognition, language translation, and much more! Neural networks leverage a variety of mathematical functions to model complex relationships and learn from data. There are three main types of functions:

1. Linear Function:
● Equation: f(x)=wx+b
● Basic building block. It performs a linear transformation on the input data, where w is the weight, x is the input, and b is the bias.

2. Activation Functions :
● Activation Functions bring non-linearity and complexity to the neural network. They are crucial in analyzing the data.
● The Sigmoid Function:

○ Equation: f(x)=

○ Role: Squashes the input values between 0 and 1. Often used in the output layer of a binary classification model.

● The Hyperbolic Tangent (tanh):

○ Equation: f(x)=

○ Role: Similar to the sigmoid but squashes values between -1 and 1. Used in hidden layers to capture more complex patterns.

● Rectified Linear Unit (ReLU):
○ Equation: f(x) = max(0,x)
○ Role: Most popular activation function. It outputs the input directly if it is positive; otherwise, it outputs zero. Efficient and allows models to learn quickly.

3. Loss Functions:
● Loss functions serve as a measure of the difference between the predicted values of a model and the actual values (ground truth) for a given set of input data.
● Mean Squared Error (MSE):

○ Equation:

○ Role: Commonly used for regression problems.

These mathematical functions combined and stacked in layers, make up neural networks.

Leonhard Euler

The Math Chronicles