Learning Rate (Step Size) in Optimization

November 25, 2025

Learning Rate (Step Size) in Optimization

1. What is Learning Rate?

The learning rate (also called step size) is a positive scalar that determines how far we move in the direction of the gradient during each iteration of an optimization algorithm.

In gradient descent:

x_{k+1} = x_k - \eta \nabla f(x_k)

where

\boxed{\eta > 0 \text{ is the learning rate}}

2. Intuitive Meaning

The gradient gives the direction
The learning rate decides the distance of movement

📌 Analogy:
Walking downhill:

Direction = slope
Step length = learning rate

3. Effect of Learning Rate Size

(a) Small Learning Rate

Very slow convergence
Many iterations needed
Stable but inefficient

📉 Example: $η = 0.001$

(b) Large Learning Rate

Faster movement
May overshoot the minimum
Can cause divergence or oscillation

📈 Example: $η = 1.2$

(c) Optimal Learning Rate

Fast convergence
Stable descent
Reaches minimum efficiently

📌 Choosing the right $η$ is crucial.

4. Visual Interpretation (1D)

Learning Rate	Behavior
Too small	Tiny steps toward minimum
Too large	Jumps back and forth
Appropriate	Smooth convergence

5. Mathematical Insight (Quadratic Case)

For:

f(x) = x^2

Update rule:

x_{k+1} = (1 - 2\eta)x_k

Convergence condition:

0 < \eta < 1

$\eta = 0.5$ → fastest convergence
$\eta \ge 1$ → divergence

6. Learning Rate and Convergence

Convex Functions

Proper $\eta$ guarantees convergence to global minimum

Non-Convex Functions

Affects:
- Speed
- Stability
- Escape from saddle points

7. Types of Learning Rates

(a) Fixed Learning Rate

\eta_k = \eta

Simple
Sensitive to choice

(b) Decaying Learning Rate

\eta_k = \frac{\eta_0}{1 + \alpha k}

Large steps initially
Small steps near minimum
Ensures convergence

(c) Adaptive Learning Rates

Automatically adjust learning rate:

AdaGrad
RMSProp
Adam

Widely used in deep learning.

8. Learning Rate in Machine Learning

Algorithm	Role
Linear Regression	Controls speed
Logistic Regression	Stability
Neural Networks	Training success
SGD	Noise control

📌 Poor learning rate → poor model training.

9. Common Problems Due to Wrong Learning Rate

Problem	Cause
Slow training	Learning rate too small
Divergence	Learning rate too large
Oscillations	High curvature
No convergence	Bad tuning

10. Practical Guidelines

Start with moderate value (e.g., 0.01 or 0.1)
Monitor loss curve
Reduce $η$ if loss oscillates
Use decay or adaptive methods

Python Code and Visualization

import numpy as np
import matplotlib.pyplot as plt

# Objective function and gradient
def f(x):
    return x**2

def grad_f(x):
    return 2*x

# Learning rates to compare
learning_rates = [0.05, 0.2, 0.9]
labels = ["Small η = 0.05", "Moderate η = 0.2", "Large η = 0.9"]

x0 = 4.0          # Starting point
iterations = 15  # Number of GD steps

# Function curve
x_curve = np.linspace(-5, 5, 400)
y_curve = f(x_curve)

plt.figure()
plt.plot(x_curve, y_curve)

# Gradient Descent for each learning rate
for eta, label in zip(learning_rates, labels):
    x = x0
    xs = [x]
    ys = [f(x)]

    for _ in range(iterations):
        x = x - eta * grad_f(x)
        xs.append(x)
        ys.append(f(x))

    plt.plot(xs, ys, marker='o', label=label)

plt.xlabel("x")
plt.ylabel("f(x)")
plt.title("Effect of Learning Rate on Gradient Descent")
plt.legend()
plt.show()

Search This Blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme- Dr Binu V P

Learning Rate (Step Size) in Optimization

Learning Rate (Step Size) in Optimization

1. What is Learning Rate?

2. Intuitive Meaning

3. Effect of Learning Rate Size

(a) Small Learning Rate

(b) Large Learning Rate

(c) Optimal Learning Rate

4. Visual Interpretation (1D)

5. Mathematical Insight (Quadratic Case)

6. Learning Rate and Convergence

Convex Functions

Non-Convex Functions

7. Types of Learning Rates

(a) Fixed Learning Rate

(b) Decaying Learning Rate

(c) Adaptive Learning Rates

8. Learning Rate in Machine Learning

9. Common Problems Due to Wrong Learning Rate

10. Practical Guidelines

Python Code and Visualization

Comments

Post a Comment

Popular posts from this blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme

Convex and Non Convex Sets

Equivalence Relation