1. What is regression
Regression is normally the first algorithm that people in machine learning work with. It allows us to make predictions from data by learning about the relationship between a given set of dependent and independent variables.
- Applied Cases
eg: The case of house price estimation. There are many factors that can have an impact on the house price: the number of rooms, the floor area, the locality, the availability of amenities, the parking space, and so on. Regression analysis can help us in finding the mathematical relationship between these factors and the house price.
2. Linear regression
Linear regression is one of the most widely known modeling techniques.
Linear regression assumes a linear relationship between the input variable (X) and the output variable (Y).
The basic idea of linear regression is building a model, using training data that can predict the output given the input.
A simple linear regression sample
- The case of house price prediction
A - the area of the house is the independent variable
Y - the price of the house is the dependent variable
𝑌̂ - predicted price
w - weight
b - bias
- 2-1) Import the necessary modules
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
- 2-2) Generate random data with a linear relationship
# Generate a random data
np.random.seed(0)
area = 2.5 * np.random.randn(100) + 25
price = 25 * area + 5 + np.random.randint(20,50, size = len(area))
data = np.array([area, price])
data = pd.DataFrame(data = data.T, columns=['area','price'])
plt.scatter(data['area'], data['price'])
plt.show()
Output
- 2-3) Calculate the two regression coefficients(weight and bias)
W = sum(price*(area-np.mean(area))) / sum((area-np.mean(area))**2)
b = np.mean(price) - W*np.mean(area)
print("The regression coefficients are", W,b)
Output:
The regression coefficients are 24.815544052284988 43.4989785533412
- 2-4) Predicting the new prices using the obtained weight and bias
# Predicting the new prices using the obtained weight and bias
y_pred = W * area + b
- 2-5) Plot the predicted prices along with the actual price
# Plot the predicted prices along with the actual price
plt.plot(area, y_pred, color='red',label="Predicted Price")
plt.scatter(data['area'], data['price'], label="Training Data")
plt.xlabel("Area")
plt.ylabel("Price")
plt.legend()
Output
3. Multivariate linear regression
There can be cases where the independent variables affect more than one dependent variable.
Here is a case of multivariate linear regression:
eg: Consider the case where we want to predict a rocket’s speed and its carbon dioxide emission, they are two of our dependent variables, and both will be affected by the sensors reading the fuel amount, engine type, rocket body, and so on.-
Mathematically, a multivariate regression model can be represented as: