Linear Regression Model using Gradient Descent algorithm
What is Machine learning?
Machine learning is the study of Computer algorithms that improves automatically through experience.
What is Supervised learning?
Supervised learning are set of models where training is done on the data that has been labeled with correct output. The model then learns to map from inputs to output by adjusting it’s parameters so that it minimizes the errors in the prediction. The loss function is used to measure the error in the model prediction compare to the correct outputs.
What is Linear Regression model?
Linear regression model is a type of Supervised learning which assumes that the relationship between dependent variables and independent variables are linear. To simplify, linear regression model can be represented as following straight line equation.
y = mx + b
So how to find out the optimal line?
For given x
and y
there are many possibles lines.
How to optimize the value of slope and intercept?
Linear regression model find out the errors for all the predicted values and actual value and chooses the line with minimum error.
Cost function (or Loss function)
Cost function is a mathematical representation to measure how model performs on the given dataset. Goal of cost function is to measure the errors and goal of machine learning model is to minimize the cost function.
The most common Cost function is MSE (Mean Squared Error), also called as LMS (Least Mean Square).
Here linear function for given vector x and y is represented as below hypothesis.
Or simply as below
Plot Cost function
Let’s take following simple linear function to calculate the cost function.
Optimal value of intercept without Gradient Descent
Let’s fix the slop as 0.64
to calculate the intercept.
Step 1: Pick any random value for intercept, let’s say 0
and calculate the total cost.
Step 2: Now pick the next intercept value as 0.25
and calculate the total cost again.
Step n: Pick the next nth intercept value and calculate the total cost.
Once all costs are calculated then draw the intercept the cost values.
Finding minimum cost of intercept is difficult using above method as not sure on how to find next intercept.
Using least square
Gradient descent identifies the optimal values by taking big steps when it is far from optimal value and baby steps when it is closer.
Least square solves the optimal value for the intercept to find out where the slope of the curve is zero.
Using Gradient Descent
Gradient descent finds the minimum value by taking steps from an initial guess until it reaches the best value. This makes Gradient descent very useful when it is not possible solve for where the derivate is equal to zero.
Size of steps should be related to slope, since it tells us if we should take a baby step or big steps but we have to make sure that big step is not that big.
Let’s take partial derivates.
Gradient decent determine the step size as slop with a small number called learning rate.
Gradient descent stops when step size is very close to zero along with max number of steps we want to perform.
Python Code for Gradient descent
import numpy as np
def gradient_descent(x, y):
m_curr = b_curr = 0
iterations = 1000
n = len(x)
learning_rate = 0.001
for i in range(iterations):
# Calculate predicted value
y_predicted = m_curr * x + b_curr
# Calculate slope for m and b
m_derivative = -sum(x * (y - y_predicted))
b_derivative = -sum(y - y_predicted)
# Update new value for m and b
m_curr = m_curr - learning_rate * m_derivative
b_curr = b_curr - learning_rate * b_derivative
# calculate cost
cost = (1/2) * sum([val ** 2 for val in (y - y_predicted)])
print("m {}, b {}, iteration {}, cost {}".format(m_curr, b_curr, i, cost ))
x = np.array([1,2,3,4,5])
y = np.array([5, 7, 9, 11, 13])
gradient_descent(x, y)