Markov Decision Process (MDP) in Reinforcement Learning
✅ Markov Decision Process (MDP) in Reinforcement Learning
A Markov Decision Process (MDP) is the standard mathematical framework used to model decision-making problems where outcomes depend on both randomness and the agent’s actions.
It describes how an agent interacts with an environment to learn the best behavior.
Core Idea (Simple Intuition)
An agent repeatedly:
-
Observes the current state
-
Takes an action
-
Receives a reward
-
Moves to a new state
The goal is to choose actions that maximize total future reward.
The Markov Property
MDPs assume the Markov property:
The future depends only on the current state, not on past history.
This makes learning and optimization tractable.
Components of an MDP
An MDP is defined by the tuple:
✅ 1. States (S)
All possible situations of the environment
Example: robot position, game board configuration
✅ 2. Actions (A)
Choices available to the agent
Example: move left/right, accelerate, pick object
✅ 3. Transition Probability (P)
Probability of moving from state to after action .
Handles uncertainty in environment.
✅ 4. Reward Function (R)
Immediate feedback received after action.
-
Positive → good action
-
Negative → penalty
✅ 5. Discount Factor (, 0–1)
Controls importance of future rewards.
-
→ focus on immediate reward
-
→ long-term planning
Objective of Reinforcement Learning
Find a policy that maximizes expected return:
Policy = rule for choosing actions in each state.
Example: Robot Navigation
Environment
Grid world
States
Robot positions on grid
Actions
Up, Down, Left, Right
Rewards
-
+10 for reaching goal
-
−1 for each step
-
−10 for hitting obstacle
Transition
Sometimes movement fails → randomness
This forms a complete MDP model.
Value Functions in MDP
🔹 State Value Function
Measures how good a state is under policy .
🔹 Action Value Function (Q-function)
Used in algorithms like Q-learning.
Bellman Equation (Key Concept)
The value of a state equals:
Meaning:
Current value = immediate reward + discounted future value.
This recursive relationship enables dynamic programming methods.
Why MDPs Are Important in RL
They provide:
-
Formal model of agent-environment interaction
-
Basis for algorithms:
-
Value Iteration
-
Policy Iteration
-
Q-Learning
-
SARSA
-
Deep RL methods
-
Almost all reinforcement learning problems are modeled as MDPs.
Real Applications
-
Robotics navigation
-
Game playing (chess, Go, Atari)
-
Self-driving cars
-
Recommendation systems
-
Resource allocation and scheduling
Comments
Post a Comment