Gradient-Based Methods for Optimization-Gradient Descent Algorithm

 

Gradient-Based Methods for Optimization

(with Gradient Descent Algorithm)


1. Optimization Problem Setup

In optimization, we aim to find

minxRnf(x)\min_{x \in \mathbb{R}^n} f(x)

where:

  • f(x)f(x) is a real-valued objective function

  • x=(x1,x2,,xn)x = (x_1, x_2, \dots, x_n) is a vector of variables

In machine learning, f(x)f(x) is typically a loss or cost function.


2. What is a Gradient-Based Method?

A gradient-based method uses first-order derivative information (the gradient) to guide the search for an optimum.

Key Idea

  • The gradient f(x)\nabla f(x) points in the direction of steepest increase

  • The negative gradient points in the direction of steepest decrease

📌 Hence, to minimize a function, we move in the negative gradient direction.


3. Geometric Interpretation of the Gradient

  • The gradient is perpendicular to level curves

  • Moving along f(x)-\nabla f(x) decreases the function value most rapidly

  • Optimization proceeds by repeatedly stepping downhill


4. Gradient Descent Algorithm

Basic Update Rule

xk+1=xkηkf(xk)\boxed{ x_{k+1} = x_k - \eta_k \nabla f(x_k) }

where:

  • xkx_k: current point

  • f(xk)\nabla f(x_k): gradient at xkx_k

  • ηk>0\eta_k > 0: step size (learning rate)


5. Algorithmic Steps (Pseudo-Code)

Gradient Descent Algorithm

  1. Initialize x0x_0

  2. Choose learning rate η\eta

  3. Repeat until convergence:

    • Compute f(xk)\nabla f(x_k)

    • Update xk+1=xkηf(xk)x_{k+1} = x_k - \eta \nabla f(x_k)

  4. Stop when:

    f(xk)ϵ\|\nabla f(x_k)\| \le \epsilon

6. Example (One-Dimensional)

Minimize

f(x)=x2f(x) = x^2
  • Gradient: f(x)=2xf'(x) = 2x

  • Update rule:

xk+1=xk2ηxkx_{k+1} = x_k - 2\eta x_k

For 0<η<10 < \eta < 1, the sequence converges to the global minimum x=0x=0.


7. Choice of Learning Rate (Step Size)

(a) Fixed Learning Rate

  • Simple

  • May overshoot if too large

  • Slow if too small


(b) Diminishing Learning Rate

ηk0as k\eta_k \rightarrow 0 \quad \text{as } k \rightarrow \infty
  • Guarantees convergence for convex functions


(c) Line Search

ηk=argminη>0f(xkηf(xk))\eta_k = \arg\min_{\eta>0} f(x_k - \eta \nabla f(x_k))
  • More accurate

  • Computationally expensive


8. Convergence Properties

Convex Functions

  • Gradient descent converges to the global minimum

  • Rate depends on condition number of Hessian

Non-Convex Functions

  • May converge to:

    • Local minimum

    • Saddle point

  • No global guarantee


9. Stopping Criteria

  • Gradient norm small:

    f(xk)<ϵ\|\nabla f(x_k)\| < \epsilon
  • Small change in function value

  • Maximum number of iterations reached


10. Variants of Gradient Descent

(a) Batch Gradient Descent

  • Uses entire dataset

  • Accurate but slow


(b) Stochastic Gradient Descent (SGD)

  • Uses one data point per update

  • Faster and noisy

  • Escapes saddle points


(c) Mini-Batch Gradient Descent

  • Uses small batches

  • Most commonly used in practice


11. Accelerated Gradient Methods

(a) Momentum Method

vk+1=βvk+ηf(xk)v_{k+1} = \beta v_k + \eta \nabla f(x_k)
xk+1=xkvk+1x_{k+1} = x_k - v_{k+1}
  • Speeds up convergence

  • Reduces oscillations


(b) Nesterov Accelerated Gradient

  • Computes gradient at a look-ahead point

  • Faster theoretical convergence


12. Comparison with Second-Order Methods

Method        Derivatives UsedCostSpeed
Gradient Descent            First            Low    Moderate
Newton’s Method            First + Second            High    Fast
Quasi-Newton            Approx. Hessian            Medium    Fast

13. Advantages of Gradient-Based Methods

  • Easy to implement

  • Scales to high-dimensional problems

  • Core method in machine learning


14. Limitations

  • Sensitive to learning rate

  • Slow for ill-conditioned problems

  • Can stall near saddle points


15. Role in Machine Learning

  • Training linear regression

  • Logistic regression

  • Neural networks (backpropagation)

  • Support vector machines (primal form)

Comments

Popular posts from this blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme

AKS Primality Testing

Galois Field and Operations