Linear Regression: Using Gradient Descent

1. What is Linear Regression?

Linear Regression is a supervised learning algorithm used to model the relationship between an input variable and an output variable by fitting a straight line.

Goal

To find a line that best predicts the output for given inputs.

2. Simple Linear Regression Model

For one input variable $x$ , the model is:

\hat{y} = wx + b

Where:

$w$ → slope (weight)
$b$ → intercept (bias)
$\hat{y}$ → predicted output

3. A Simple Example

Problem Statement

Predict a student’s exam score based on hours studied.

Hours studied (x)	Score (y)
1	50
2	55
3	65
4	70
5	75

4. Visual Idea

We want to fit a straight line through these points such that the overall error is minimized.

The line might look like:

\hat{y} = 6x + 45

5. How Do We Measure Error?

We use Mean Squared Error (MSE):

\text{MSE} = \frac{1}{n}\sum_{i=1}^{n} (\hat{y}_i - y_i)^2

This penalizes large errors more heavily.

6. Learning the Best Line (Concept)

Start with random $w$ and $b$
Predict scores using $\hat{y} = wx + b$
Compute errors
Adjust $w$ and $b$ to reduce errors
Repeat until the best line is found

This adjustment is done using Gradient Descent.

7. Interpretation of Parameters

Slope $w$
→ Increase in score per additional hour studied
(e.g., $w = 6$ means +6 marks per hour)
Intercept $b$
→ Expected score with zero hours studied

8. Why Linear Regression Works Here

Relationship is approximately linear
Errors are minimized globally (convex problem)
Model is simple and interpretable

9. Definition

Linear Regression is a supervised learning algorithm that models the relationship between dependent and independent variables by fitting a linear equation that minimizes the mean squared error.

10. Key Takeaways

✔ Simple and interpretable
✔ Works well for linear relationships
✔ Foundation for many ML algorithms
✔ Solved using optimization

11. Closed-Form Solution

For advanced students:

w = (X^TX)^{-1}X^Ty

But in practice, gradient descent is preferred for large datasets.

12. One-Line Intuition

Linear regression finds the straight line that best fits the data by minimizing prediction errors.

How Gradient Descent Finds the Best-Fit Line in Linear Regression

1. Problem Setup (Recalled)

We have data points:

$(x_i, y_i), \quad i=1,2,\dots,n$

Model (Straight Line)

$\hat{y} = wx + b$

Our goal is to find $w$ (slope) and $b$ (intercept) such that the line fits the data as well as possible.

2. What Does “Best Fit” Mean?

“Best fit” means minimizing the total prediction error.

We measure error using Mean Squared Error (MSE):

$J(w,b) = \frac{1}{n}\sum_{i=1}^n (wx_i + b - y_i)^2$

📌 This is a convex function in $w$ and $b$ .
So it has one global minimum → one best-fit line.

3. Geometry of the Problem (Very Important)

Each choice of $(w,b)$ gives a different line
Each line produces a certain MSE value
If we plot $J(w,b)$ , we get a bowl-shaped surface
The lowest point of the bowl corresponds to the best line

👉 Gradient descent moves us down this bowl.

4. Role of Gradient Descent

Key Idea

Gradient descent updates $w$ and $b$ in the direction that reduces error the fastest.

Mathematically:

$\text{New value} = \text{Old value} - \eta \times \text{Gradient}$

5. Compute Gradients (Direction of Steepest Increase)

Gradient with respect to $w$

$\frac{\partial J}{\partial w} = \frac{2}{n}\sum_{i=1}^n (wx_i + b - y_i)x_i$

Gradient with respect to $b$

$\frac{\partial J}{\partial b} = \frac{2}{n}\sum_{i=1}^n (wx_i + b - y_i)$

📌 These gradients tell us:

If slope is too steep or too flat
If line is too high or too low

6. Gradient Descent Update Rules

$w \leftarrow w - \eta \frac{2}{n}\sum (wx_i + b - y_i)x_i$ $b \leftarrow b - \eta \frac{2}{n}\sum (wx_i + b - y_i)$

Each update slightly rotates or shifts the line.

7. Simple Numerical Example (Concrete)

Data:

$(1,2), (2,3)$

Step 1: Initialize

$w=0,\quad b=0$

Predictions:

$\hat{y} = 0$

Errors:

$(0-2),(0-3)$

Step 2: Compute Gradients

$\frac{\partial J}{\partial w} = \frac{2}{2}[(-2)(1)+(-3)(2)] = -8$ $\frac{\partial J}{\partial b} = \frac{2}{2}(-2-3) = -5$

Step 3: Update Parameters ( $\eta=0.1$ )

$w = 0 - 0.1(-8) = 0.8$ $b = 0 - 0.1(-5) = 0.5$

📌 New line:

$\hat{y} = 0.8x + 0.5$

This line is closer to the data than before.

8. Repeating the Process

Each iteration:

Line moves closer to data
Error reduces
Gradients become smaller
Eventually updates stop changing

At convergence:

$\nabla J(w,b) = 0$

This gives the best-fit line.

9. Why Gradient Descent Always Finds the Best Line

✔ MSE loss is convex
✔ Only one minimum exists
✔ Gradient descent cannot get stuck elsewhere

So the final line is globally optimal.

10. Intuitive Explanation

Gradient descent keeps adjusting the slope and intercept until the line cannot be improved any further.

11. Visual Interpretation ( Analogy)

Imagine placing a ball on a bowl
The ball rolls down
It stops at the lowest point
That point corresponds to the best-fit line

12. Conclusion

In linear regression, gradient descent works by iteratively updating the slope and intercept in the direction opposite to the gradient of the mean squared error until the global minimum is reached, yielding the best-fit line.

13. Python Code

import numpy as np
import matplotlib.pyplot as plt

# Example data
x = np.array([1, 2, 3, 4, 5])
y = np.array([50, 55, 65, 70, 75])

# Initialize parameters
w = 0.0      # slope
b = 0.0      # intercept
eta = 0.01   # learning rate
iterations = 25

# Store history for plotting
w_history = []
b_history = []

# Gradient Descent
for _ in range(iterations):
    y_pred = w * x + b
    
    # Gradients
    dw = (-2 / len(x)) * np.sum(x * (y - y_pred))
    db = (-2 / len(x)) * np.sum(y - y_pred)
    
    # Update parameters
    w = w - eta * dw
    b = b - eta * db
    
    w_history.append(w)
    b_history.append(b)

# Plot data points
plt.figure()
plt.scatter(x, y)

# Plot intermediate lines
x_line = np.linspace(0, 6, 100)
for i in range(0, iterations, 5):
    y_line = w_history[i] * x_line + b_history[i]
    plt.plot(x_line, y_line,'b')

# Plot final best-fit line
y_final = w * x_line + b
plt.plot(x_line, y_final,'r')

plt.xlabel("Hours Studied")
plt.ylabel("Score")
plt.title("Gradient Descent: Evolution of Best-Fit Line")
plt.show()

print("Final slope (w):", w)
print("Final intercept (b):", b)


The best fit line is drawn in red color
Final slope (w): 16.336112134810687
Final intercept (b): 7.910100074060203