Learning Rate (Step Size) in Optimization
Learning Rate (Step Size) in Optimization
1. What is Learning Rate?
The learning rate (also called step size) is a positive scalar that determines how far we move in the direction of the gradient during each iteration of an optimization algorithm.
In gradient descent:
where
2. Intuitive Meaning
-
The gradient gives the direction
-
The learning rate decides the distance of movement
📌 Analogy:
Walking downhill:
-
Direction = slope
-
Step length = learning rate
3. Effect of Learning Rate Size
(a) Small Learning Rate
-
Very slow convergence
-
Many iterations needed
-
Stable but inefficient
📉 Example:
(b) Large Learning Rate
-
Faster movement
-
May overshoot the minimum
-
Can cause divergence or oscillation
📈 Example:
(c) Optimal Learning Rate
-
Fast convergence
-
Stable descent
-
Reaches minimum efficiently
📌 Choosing the right is crucial.
4. Visual Interpretation (1D)
| Learning Rate | Behavior |
|---|---|
| Too small | Tiny steps toward minimum |
| Too large | Jumps back and forth |
| Appropriate | Smooth convergence |
5. Mathematical Insight (Quadratic Case)
For:
Update rule:
Convergence condition:
-
→ fastest convergence
-
→ divergence
6. Learning Rate and Convergence
Convex Functions
-
Proper guarantees convergence to global minimum
Non-Convex Functions
-
Affects:
-
Speed
-
Stability
-
Escape from saddle points
-
7. Types of Learning Rates
(a) Fixed Learning Rate
-
Simple
-
Sensitive to choice
(b) Decaying Learning Rate
-
Large steps initially
-
Small steps near minimum
-
Ensures convergence
(c) Adaptive Learning Rates
Automatically adjust learning rate:
-
AdaGrad
-
RMSProp
-
Adam
Widely used in deep learning.
8. Learning Rate in Machine Learning
| Algorithm | Role |
|---|---|
| Linear Regression | Controls speed |
| Logistic Regression | Stability |
| Neural Networks | Training success |
| SGD | Noise control |
📌 Poor learning rate → poor model training.
9. Common Problems Due to Wrong Learning Rate
| Problem | Cause |
|---|---|
| Slow training | Learning rate too small |
| Divergence | Learning rate too large |
| Oscillations | High curvature |
| No convergence | Bad tuning |
10. Practical Guidelines
-
Start with moderate value (e.g., 0.01 or 0.1)
-
Monitor loss curve
-
Reduce if loss oscillates
-
Use decay or adaptive methods
.png)
Comments
Post a Comment