Introduction to Markov Chain Monte Carlo ( MCMC)
Markov Chain — Introduction
🔹 Intuitive Idea
A Markov chain is a random process where:
The future depends only on the present, not on the past.
This is called the Markov property or memoryless property.
🔹 Everyday Examples
-
Weather:
-
Tomorrow’s weather depends on today’s weather, not last week’s.
-
-
Board games:
-
Your next position depends only on your current square.
-
-
Web surfing:
-
Next page depends only on the current page.
-
🔹 Formal Definition
A sequence of random variables is a Markov chain if:
🔹 Transition Probabilities
We define:
These form the transition matrix:
Properties:
-
Each entry ≥ 0
-
Each row sums to 1
🔹 Example: Simple Weather Model
States:
Transition matrix:
Interpretation:
-
Sunny → Sunny: 0.8
-
Sunny → Rainy: 0.2
🔹 Stationary Distribution
A probability vector is stationary if:
Meaning:
-
Long-run probability of being in each state
🔹 Why Markov Chains Matter in ML
Used in:
-
Hidden Markov Models
-
Reinforcement learning
-
PageRank algorithm
-
MCMC sampling
-
Speech recognition
-
NLP sequence models
Monte Carlo Methods — Introduction
🔹 Intuitive Idea
Monte Carlo methods use random sampling to solve problems that are hard to compute analytically.
Replace complicated math with repeated random experiments.
🔹 Origin of the Name
Named after Monte Carlo Casino, because results rely on randomness.
🔹 Core Principle
If you want:
Instead of solving integrals, approximate by:
where are random samples.
🔹 Example: Estimating π
-
Draw random points inside a square
-
Count how many fall inside a circle
-
Ratio approximates π
🔹 Monte Carlo Integration
To compute:
Use:
with
🔹 Why Monte Carlo Works
Based on Law of Large Numbers:
as .
Markov Chain + Monte Carlo = MCMC
🔹 The Big Idea
When sampling directly from a distribution is hard:
-
Construct a Markov chain
-
Design it so its stationary distribution = target distribution
-
Run the chain long enough
-
Use the samples
This is Markov Chain Monte Carlo (MCMC).
Why Do We Need MCMC?
The Core Problem in Bayesian Inference
In Bayesian learning we want the posterior distribution:
The difficulty:
-
involves an integral over all parameters
-
Often impossible to compute analytically
-
Can be very high dimensional
Example:
For realistic ML models → this integral is intractable.
What Do We Actually Need?
Usually we don’t need the exact posterior formula.
Instead we want:
-
Posterior mean
-
Posterior variance
-
Predictions
-
Credible intervals
All of these can be written as:
💡 Monte Carlo Idea (First Insight)
If we had samples:
then:
So:
Bayesian inference becomes easy if we can sample from the posterior.
The Problem
We cannot sample directly from
-
Distribution may be complicated
-
High dimensional
-
Only known up to a constant
We only know:
The Brilliant Idea of MCMC
Instead of sampling directly:
Build a Markov chain whose long-run behavior equals the target posterior.
Then:
-
Start anywhere
-
Move randomly using special rules
-
After many steps, positions follow the posterior distribution
-
Treat those positions as samples
Intuition: Random Walk in a Mountain Landscape
Imagine:
-
Landscape height = posterior probability
-
High mountains = high probability regions
-
Valleys = low probability
We want to:
Spend more time on high mountains.
MCMC does:
-
Walk randomly
-
Prefer uphill moves
-
Sometimes accept downhill moves
-
Eventually explore entire landscape
-
Time spent at each location ∝ probability
Why Markov Chain?
Because:
-
Next step depends only on current location
-
Not on full history
This keeps the process simple and mathematically tractable.
Why Monte Carlo?
Because we approximate integrals using:
So:
-
Markov chain → generates samples
-
Monte Carlo → uses samples to compute expectations
Together:
How MCMC Works Conceptually
Step 1: Start at some parameter value
Step 2: Propose a new value
Step 3: Decide whether to move
If new point has higher probability → accept.
If lower → accept with some probability.
This ensures:
-
Exploration
-
No getting stuck forever
Step 4: Repeat many times
After a while:
-
Chain “forgets” starting point
-
Samples follow posterior
What Happens Over Time
Initially:
-
Samples biased toward starting point
After burn-in:
-
Samples represent posterior
Long run:
-
Histogram of samples ≈ posterior density
Why MCMC Is Powerful
Without MCMC:
Bayesian inference works only for simple models.
With MCMC:
We can handle:
-
Neural networks
-
Hierarchical Bayesian models
-
Complex graphical models
-
High-dimensional parameter spaces
🌍 Real Machine Learning Uses
-
Bayesian neural networks
-
Topic modeling
-
Gaussian process inference
-
Probabilistic programming (Stan, PyMC)
-
Computational biology
-
Physics simulations
Simple Analogy for Students
🔎 Searching for Fish in a Lake
-
Fish density = probability
-
You move randomly
-
Stay longer where fish are abundant
-
Record your positions over time
Your recorded positions represent fish distribution.
That’s MCMC.
One-Line Intuition
MCMC generates samples from a complex distribution by simulating a Markov chain whose stationary distribution equals the target distribution.
Summary
Goal:
Method:
-
Construct transition rule
-
Ensure stationary distribution = posterior
-
Run chain long enough
-
Use samples for Monte Carlo estimation
Comments
Post a Comment