Building Neural Networks from Scratch

Neural networks are powerful machine learning algorithms that have transformed countless industries. They can power everything from fraud detection and demand forecasting to personalized recommendations and autonomous systems and are a great way to incorporate smarter decision-making into your applications.

In this guide, we’ll dive deep into the fundamentals of neural networks, from the first representations of artificial neurons to implementing your own linear regression and classification models.

Get ready to unlock the full potential of neural networks and embark on an exciting journey in artificial intelligence.

1. Neural networks as a machine learning algorithm

Neural networks are a type of machine learning algorithm inspired by the structure and function of the human brain. They are made up of interconnected nodes, or “neurons,” that can learn to perform specific tasks by analyzing large amounts of data.

Neural networks have many applications, including image recognition, natural language processing, speech recognition, and predictive analytics. They excel at identifying patterns and making complex decisions, making them a valuable tool in many industries.

Neural networks are highly flexible and can adapt to a variety of problems. They can learn from data and improve their performance over time, making them powerful tools for tackling complex, real-world challenges.

2. Task, model, performance measurement, and experience

To build an effective neural network, you need to consider several key components that define its development and success. These include clearly identifying the task the network will perform, selecting an appropriate model, establishing performance measurement criteria, and leveraging experience through training data.

Each element plays a crucial role in shaping how well the neural network learns and generalizes to new data. Here’s what each element does and why they’re important:

Task – The first step in building a neural network is to define the task you want it to perform. This could be anything from image classification to natural language processing. Defining the task helps you design the appropriate network architecture and choose the correct training data.
Model – Model selection is an important aspect of building a neural network. Different types of neural network models (like feedforward neural networks, convolutional neural networks, and recurrent neural networks) are suited for different tasks. Choosing the right model improves the network’s performance and ability to solve the desired problem.
Performance measurement – Once you’ve defined the task, you need to determine how to measure your neural network’s performance of your neural network. Metrics can include things like accuracy, precision, recall, or F1-score. Depending on the specific problem you’re trying to solve.
Experience – The final step is to provide the neural network with training data, allowing it to learn and improve its performance over time. This experience phase is crucial for building a high-performing model that can generalize well to new, unseen data.

In this post, we’re looking at the first artificial neuron model, so we’ll define the task as a linear regression problem.

The linear regression problem is when we use the artificial neuron to represent functions. Instead of using a function like y = mx + b, we present the data to the artificial neuron and let it discover the appropriate values for the coefficients m and b that represent the problem. We will return to this in Section 4.

In the next section, we will describe artificial neuron modeling. Don’t be scared of the math — we won’t go too deep into it, promise!

3. The First representation of an artificial neuron (Perceptron)

The foundation of modern neural networks can be traced back to the earliest mathematical representation of an artificial neuron, known as the Perceptron. First introduced by Warren McCulloch and Walter Pitts in 1943, this simple yet powerful model mimics how biological neurons process information. By taking weighted inputs, applying a bias, and passing the result through an activation function, the perceptron is the fundamental building block for more advanced neural network architectures.

This section explores the key components of the perceptron and how they contribute to its ability to make decisions.

Equation 1 is the mathematical model of the artificial neuron:

Where:

y is the output,

phi is the activation function,

X is the vector containing all the inputs,

W is the vector containing all the weights,

b is the bias

From linear algebra, we can describe the vector product as:

Where:

n is the total number of inputs and weights.

Inputs – The first representation of an artificial neuron, proposed by McCulloch and Pitts, takes a set of inputs (x₁, x₂, …, x_n) and assigns a weight (w₁, w₂, …, w_n) to each input.
Weighted sum – The neuron then computes the weighted sum of the inputs, which is the sum of the products of each input and its corresponding weight, as shown by Equation 2.
Bias – The bias, represented by the parameter b, is added to the weighted sum before the activation function is applied. This allows the neuron to shift its activation threshold, enabling more complex decision boundaries.
Activation function – Finally, the neuron applies an activation function, such as a step function or a sigmoid function, to the weighted sum to determine the neuron’s output. We will talk about activation functions later.

Depicts the neuron McCulloch and Pitts used as inspiration to develop the Perceptron model.

Visual representation of Equation 1, which makes it easy to compare with a real neuron.

Next, we will describe how to use this model to solve a linear regression problem. Plus, we’ll also show you how to measure the model error and train it to reduce that error.

4. The regression problem

As described in Section 2, the first task we will assign to our artificial neuron is a regression problem. To simplify the representation, we can remove the activation function and keep only one input.

Simplified representation of an artificial neuron

Before we dive deeper into linear regression, let’s answer a few questions:

What is linear regression?

Linear regression is a fundamental machine learning algorithm that can be used to model the relationship between a dependent variable and one or more independent variables. The goal is to find the best-fitting straight line that minimizes the distance between the data points and the line.

What are the applications of linear regression?

Linear regression has many applications, including predicting sales, forecasting stock prices, and analyzing the relationship between various factors in social and economic studies. It’s a powerful tool for understanding and quantifying the relationships between variables.

How to implement a linear regression?

To implement linear regression, we need to define the model equation, which takes the form y = mx + b, where y is the dependent variable, x is the independent variable, m is the slope, and b is the y-intercept. We then use optimization techniques to find the values of m and b that best fit the data.

What are the limitations of linear regression?

While linear regression is a practical algorithm, it has limitations. It assumes a linear relationship between the variables, which may not always be the case. It can also be sensitive to outliers and may not perform well when dealing with complex, non-linear relationships.

Check out this post to learn more about using regression in AI development.

So, we’ve defined the task and model we will work on, but we need two more steps to make everything fit. Let’s take a look at how we measure the model results and how we train it.

There are different methods and algorithms to measure and train artificial neurons. They get more complex as the problems we tackle get more challenging. But, to understand how we connect everything we will use the Mean Square Error (MSE) and the backpropagation algorithms to solve our problem.

4.1 Measuring the error

To measure the error of our model, we use a loss or cost function that quantifies the difference between the predicted values and the actual values of the dependent variable. The goal is to minimize this function by adjusting the slope and y-intercept values. The most commonly used loss function in linear regression is the Mean Squared Error (MSE), which calculates the average squared difference between the predicted and actual values, as described in Equation 3.

Where: Y is the predicted value, and Ŷ is the reference value from the training data set.
Ok, now what do we do with the value of the MSE?

We use it to adjust the model’s weight and bias, which makes it predict values closer to the reference value from the data set. For that, we use the backpropagation algorithm and a bunch of hyperparameters to help us train the model.

4.2 Training the model

Artificial neurons are trained by adjusting the weights and biases associated with them through a process called backpropagation. During training, the input data is fed through the neural network, and the output is compared to the desired output. The difference between the two is used to calculate the loss, and then the weights and biases are adjusted to minimize the loss using optimization algorithms like gradient descent. This process is repeated iteratively until the network reaches satisfactory accuracy.

The Gradient Descent algorithm in machine learning is used to minimize the cost function and find the optimal set of weights by following the steepest descent in the negative direction of the gradient.

In each iteration, the parameters are updated by subtracting the gradient multiplied by a learning rate, which is a hyperparameter that determines the step size. The process is repeated until the convergence criteria are met.

The gradient descent equation is shown in Equation 4, where 𝛻F(y) is the gradient of the Cost Function, is the learning rate and w is the model weight.

Figure 4 presents the process of converging the weight to a minimum value using the gradient descendent. It is important to note a few points:

Learning rate: The learning rate hyperparameter has no exact value. It must be chosen empirically by testing different values and observing which gives the best results. However, a hint is to use very small values, such as 0.0001 or 0.00001, as high values make the algorithm pass by the local minimum, skipping the optimum weight value and rapidly exploding.

Epochs: The number of epochs is the number of times we want the backpropagation algorithm to run through our model to adjust the weights and bias. The epochs are another hyperparameter that must be chosen by the people developing the model. The hint here is to test because if the training is short and the number of epochs is low, the model can’t learn very well and won’t generalize the outputs correctly (this phenomenon is known as model underfit). Meanwhile, if the number of epochs is too high, the model becomes very specialized on the training data and loses the ability to generalize (leading to a phenomenon known as model overfit).

*Graphical representation of the Gradient Descendent algorithm applied to a model to find the minimum weight value that optimizes the cost function.*

Today, algorithms already exist that help to choose the best hyperparameter values for Perceptrons. However, here, we are presenting an example developed from scratch without the help of any framework or library. This process is essential to understanding how we train bigger neural networks. The algorithms are more sophisticated in reducing training time, but the process is the same.

4.3 Show me the code

Okay, after all this theory and math, let’s dive into the code to understand how to develop a simplified artificial neuron to solve a linear regression task. All the code presented here is available in this repository.

This Python code implements a simple neural network with a single neuron, capable of learning a linear relationship between inputs and outputs. It includes methods for forward propagation, loss computation, and training using gradient descent. Let’s break it down step by step.

import random

class NeuralNetwork:
	def __init__(self):
    	  # Random initialize weight
    	  self.weight = random.random()
    	  self.bias = random.random()

	def forward(self, _input: float | int):
    	  # Calculate the weighted sum
    	  output = self.weight * _input + self.bias
    	  return output

	def predict(self, X: [float]) -> [float]:
    	  Y_predicted = []
    	  for x in X:
          prediction = self.forward(x)
          Y_predicted.append(prediction)
    	  return Y_predicted

	@staticmethod
	def compute_loss(predicted_output: float, target: float) -> float:
    	  # In this example we used Mean Square Error (MSE)
    	  return (predicted_output - target) ** 2

	def compute_total_loss(self, targets: [float], predicted_outputs: [float]) -> float:
    	  # In this example we used Mean Square Error (MSE)
    	  total_loss = 0
    	  for target, predicted_output in zip(targets, predicted_outputs):
          total_loss += (predicted_output - target) ** 2

    	  return total_loss

	def compute_loss_derivative(self, predicted_output: float, target: float) -> float:
    	  return (predicted_output - target) * 2

	def train(self, training_sample: [float], learning_rate=0.01, epochs=1000):
    	  for epoch in range(1, epochs + 1):
          total_loss = 0

          for _input, target in training_sample:
            # Forward Pass
            predicted_output = self.forward(_input)
            # print(_input, target, predicted_output)

            # Calculate Loss
            loss = self.compute_loss(predicted_output, target)
            total_loss += loss

            # Backward Pass (Calculate gradients)
            loss_derivative_value = self.compute_loss_derivative(predicted_output, target)
            gradient = loss_derivative_value * _input
            bias_gradient = loss_derivative_value

            # Update Weights and Bias
            self.weight -= learning_rate * gradient
            self.bias -= learning_rate * bias_gradient

          # Print loss every 100 epochs
          if epoch % 100 == 0:
            print(f"Epoch: {epoch}, Loss: {total_loss:.4f}, Weight: {self.weight:.4f}, Bias: {self.bias:.4f}")Code language: Python (python)

The NeuralNetwork class starts with a constructor (__init__), which initializes a weight and a bias. These values are randomly assigned using random.random(), ensuring the model starts with non-zero parameters.

import random

class NeuralNetwork:
	def __init__(self):
    	 # Randomly initialize weight and bias
    	 self.weight = random.random()
    	 self.bias = random.random()Code language: Python (python)

The function forward performs a forward pass through the neuron. It calculates the weighted sum of the input and adds the bias.

def forward(self, _input: float | int):
	# Calculate the weighted sum
	output = self.weight * _input + self.bias
	return outputCode language: Python (python)

The predict method takes a list of inputs and returns the corresponding outputs. It simply applies the forward function to each input value.

def predict(self, X: [float]) -> [float]:
	Y_predicted = []
	 for x in X:
    	  prediction = self.forward(x)
    	  Y_predicted.append(prediction)
	return Y_predictedCode language: Python (python)

The compute_loss function calculates the Mean Squared Error (MSE) for a single prediction. The squared difference ensures that errors are always positive and penalizes larger deviations more heavily.

@staticmethod
def compute_loss(predicted_output: float, target: float) -> float:
	return (predicted_output - target) ** 2Code language: Python (python)

The compute_total_loss function computes the total loss over a dataset by summing individual squared errors. This helps track how well the model performs over multiple data points.

def compute_total_loss(self, targets: [float], predicted_outputs: [float]) -> float:
	total_loss = 0
	for target, predicted_output in zip(targets, predicted_outputs):
    	total_loss += (predicted_output - target) ** 2
	return total_lossCode language: Python (python)

The compute_total_derivative function calculates the derivative of the loss function with respect to the predicted output. Since we’re using MSE, the derivative is:

This derivative is essential for gradient descent to update model parameters.

def compute_loss_derivative(self, predicted_output: float, target: float) -> float:
	return (predicted_output - target) * 2Code language: Python (python)

Lastly, the train function is used to train the artificial neuron using gradient descent:

Forward Pass: Computes predictions for each input.
Loss Calculation: Evaluates how far predictions are from actual values.
Backward Pass: Uses loss derivative to compute gradients.
Parameter Update: Adjusts weight and bias using the learning rate.

The loss is printed every 100 epochs to track training progress.

def train(self, training_sample: [float], learning_rate=0.01, epochs=1000):
	for epoch in range(1, epochs + 1):
    	total_loss = 0

    	for _input, target in training_sample:
        	# Forward Pass
        	predicted_output = self.forward(_input)

        	# Calculate Loss
        	loss = self.compute_loss(predicted_output, target)
        	total_loss += loss

        	# Backward Pass (Calculate gradients)
        	loss_derivative_value = self.compute_loss_derivative(predicted_output, target)
        	gradient = loss_derivative_value * _input
        	bias_gradient = loss_derivative_value

        	# Update Weights and Bias
        	self.weight -= learning_rate * gradient
        	self.bias -= learning_rate * bias_gradient

    	# Print loss every 100 epochs
    	if epoch % 100 == 0:
        	print(f"Epoch: {epoch}, Loss: {total_loss:.4f}, Weight: {self.weight:.4f}, Bias: {self.bias:.4f}")Code language: Python (python)

Let’s look at an example to clarify how we use this class. The code presented below was extracted from the same repository, and the complete version can be found in the linear_regression.ipynb Jupyter Notebook.

To illustrate the linear regression process, the notebook generates synthetic data:

Input Values (X): A set of random values.
Output Values (Y): Generated using a linear relationship with X, typically in the form Y = mX + b + noise, where m is the slope, b is the intercept, and noise adds variability to simulate real-world data.

This synthetic data serves as a controlled environment to demonstrate the mechanics of linear regression. We added some random noise to create a dispersion between the values. Otherwise, we would have just an ascending straight line with a slope equal to 5.

def regression_function(samples=100) -> ([int], [float]):
    X = []
    Y = []
    for x in range(samples):
        # y = 5 * x + 1
        y = 5 * x + random.uniform(-100, 100)
        X.append(x)
        Y.append(y)

    return X, YCode language: Python (python)

Shows a plot of the generated synthetic data to understand where the artificial neuron will learn from.

Next, we instantiate an artificial neuron and send some data to check whether it can predict the output. Figure 6 shows the result, with the red line composed of the artificial neuron output or prediction values.

from perceptron.linear_regression import NeuralNetwork

nn = NeuralNetwork()
Y_predicted = nn.predict(X)
print_prediction_function(X, Y, Y_predicted)Code language: Python (python)

As you can see, the red line doesn’t reflect the reality of the training data, which means we need to train the artificial neuron. The next step is to call the method train with epochs = 100000 and learning_rate = 0.00001. This will apply the backpropagation algorithm 100000 times and update the weight bias with a very low step, which helps the model converge to optimum values representing the data. The final values obtained after the training were:

Epoch: 100000,
Loss: 367628.9078,
Weight: 5.4246,
Bias: -18.1625

It is important to note that the Loss value is accumulated along the train process. So, in the first steps, we have a huge error, but as the neuron updates its weight and bias, the error becomes stable.

population = [*zip(X, Y)]
training_sample = [*zip(X, Y)]
learning_rate = 0.00001
epochs = 100000
nn.train(training_sample, learning_rate, epochs)Code language: Python (python)

Figure 7 presents the final result after a trained artificial neuron predicts the values.

Wrapping up

Now, we can use this trained artificial neuron to predict new values. For example, we can expand this example to a house cost prediction task, where we have dozens or hundreds of input variables, and we need to use them to define a house’s final value based on its attributes.

It is important to note that we didn’t cover key aspects of the training process, such as allocating 80% of the data for training, using the remaining 20% for validation, or cleaning and standardizing the data. This kind of technique helps during model development and can be tricky. But you can count on the Cheesecake Labs team to help you with neural network model development. We have specialized engineers ready to dive into the data and create models that solve the problems your business is facing.

In a future blog post, we will take a look at the classification problem. Until then, check out some of our other posts and reports that explain our approach to machine learning and AI app development at Cheesecake Labs:

If you have an idea for a project that would benefit from a neural network, send us a message, and let’s chat! We’d love to help you bring your ideas to life.

schedule a call with cheesecake labs experts

About the author.

Paulo Nascimento