Math Behind Simple Linear Regression + Scikit Learn

Step by step prove simple linear regression formula with Scikit Learn

Thirasha Praween
5 min readAug 25, 2021
Cover by Author

Linear Regression is basically a used type of predictive analysis and one of the most simple algorithms in machine learning. It attempts to measure the relationship between variables by fitting a linear equation to observed data. For example, when the mobile phone’s age increases, the price will go down. So, one variable is an explanatory variable (Age). Or otherwise, we can say it’s an independent variable. And the other one is considered to be the dependent variable (Price).

From that example, we can say the future price of the mobile phone using that observed data. Here is a table of the example data.

In this case, we see that a negative relationship between mobile phone age and price. Why do I say that, when the mobile phone’s age increases, the price will decrease.

Another example is when experience increases, so do the salary. It’s a positive relationship.

We’re trying to predict the mobile phone’s future prices given the age like this.

Photo by author

The question is what is the price after 7 years?. Let’s put a point there to see how much it is.

Photo by author

It’s a little bit lower than one hundred and fifty usd. So, Now see the mathematical side behind simple linear regression. The formula is y = mx + b. I know you're a little bit familiar with this formula. because mostly we all learned this in school.

  • y - What we are going to predict. In this case, mobile phone price (dependent variable)
  • m - Slope or constant
  • x - Input as 7 years (independent variable)
  • b - Intercept

And m and b are given by the following formula.

Formula image by author
Formula image by author

Find the linear regression equation for that mobile phone price data set.

Okay, now we can assign those values to that formulas and get the value of m and b.

Find m - Slope

Photo by author

Find b - Intercept

Photo by author

Predict the mobile phone price after 7 years. using y = mx + b. The y is the price of the mobile phone after 7 years (that we're going to predict). x is 7 years.

Photo by author

The mobile phone price after 7 years is 133.40 usd. Now do the same thing with scikit learn linear regression model using Python.

Linear Regression Model (Scikit Learn)

First, We have to save that data set into a csv file. To do that, create a new csv file as mobiledata.csv and add those data like this.

Photo by author

Let’s code it! I’m using Jupyter Notebook. You can use any Python IDE as you prefer. Next, Install the libraries that we need. (If you are using Jupyter Notebook, add an exclamation mark before the command to act as if it is executed in the terminal)

!pip install scikit-learn
!pip install numpy
!pip install pandas
!pip install matplotlib

Import those libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

Read the mobiledata.csv file using pandas

data_set = pd.read_csv('mobiledata.csv')

Create a chart and put the points there

plt.scatter(data.age, data.price, color='red')
plt.xlabel('Mobile phone Age')
plt.ylabel('Price')

You can see the chart like this.

Chart image by author

Get the age values as x and price values as y. We need to convert those values to a numpy array.

x = np.array(data.age.values)
y = np.array(data.price.values)

Create a linear regression class object and train the model using the fit function. Also, the model.fit function allows a two-dimensional array to x position.

model = LinearRegression()
model.fit(x.reshape((-1,1)), y)
# x.reshape((-1,1) is convert numpy array to two dimensional array

We can find the best fit line for this data set if we want. And get the values of m (Slope) and b (Intercept).

plt.scatter(data.age, data.price, color='red')
plt.xlabel('Mobile phone Age')
plt.ylabel('Price')
m,b = np.polyfit(x,y,1)
plt.plot(x,m*x+b)
Chart image by author

Finally, predict the mobile phone price after 7 years using the model. The model object is defined as model. Predict the price to see whether it's equal to the previously calculated value or not. To do that, We need to convert x value (7) to a numpy array and two-dimensional array.

year_seven = np.array([7]).reshape((-1,1))
# Predict the price
model.predict(year_seven)

You’ll see the price after predict using the model is exactly the same as the previously calculated value that We using the formula.

# array([133.40425532])

You can check the values of m and b by executing the variable in the notebook.

m
# -20.691489361702125
b
# 278.2446808510638

Happy Coding🎉

--

--