Page Rank Algorithm
Page Rank Algorithm
PageRank is an algorithm originally developed at Google to rank web pages in search results.
Core Idea (Simple Intuition)
A webpage is important if:
-
many pages link to it, and
-
those linking pages are themselves important.
👉 This creates a recursive importance score.
Everyday analogy
Think of academic papers:
-
A paper cited by many important papers is influential.
-
Not all citations count equally.
PageRank works the same way for web pages.
Random Surfer Model (Most intuitive view)
Imagine a user:
-
Starts on a random webpage
-
Clicks a random link on the page
-
Repeats forever
Sometimes the user:
-
gets bored and jumps to a random page instead
The PageRank of a page = probability the surfer is on that page after a long time.
Mathematical Formulation
Suppose there are pages.
PageRank of page :
Meaning of symbols
| Symbol | Meaning |
|---|---|
| damping factor (usually 0.85) | |
| pages linking to page | |
| number of outgoing links from page | |
| total number of pages |
Interpretation
-
→ random jump probability
-
second term → importance passed through links
Matrix Form (Markov Chain View)
Let:
where:
-
= transition probability matrix
-
PageRank = stationary distribution of a Markov chain
This is why PageRank connects to:
-
Markov chains
-
eigenvectors
-
steady-state probability
Simple Example
Link structure
-
A → B, C
-
B → C
-
C → A
Initially:
After iterations:
-
C receives links from A and B → score increases
-
A receives from C → moderate
-
B receives only from A → smaller
Eventually values stabilize.
This stable vector = PageRank.
Algorithm Steps
-
Initialize ranks equally:
-
Update ranks using formula
-
Repeat until convergence
-
Output final ranks
This is essentially power iteration.
Why PageRank Works
Without PageRank:
-
pages with many links dominate (even spam)
With PageRank:
-
links from important pages count more
-
prevents simple manipulation
It measures quality of links, not just quantity.
Applications of PageRank
🌍 1. Web Search Ranking
Original use:
-
rank webpages by importance
-
combined with content relevance
📱 2. Social Network Analysis
Used to find:
-
influential users on Twitter/X
-
important Instagram accounts
-
key nodes in communication networks
📚 3. Citation Analysis
Used in research:
-
rank scientific papers
-
find influential authors
-
journal impact measurement
🧬 4. Biology & Medicine
Applied to:
-
protein interaction networks
-
gene importance ranking
-
disease propagation modeling
🛒 5. Recommendation Systems
Variants used for:
-
product recommendation
-
movie ranking
-
content importance scoring
🚦 6. Network Reliability & Traffic Flow
Helps identify:
-
critical routers in internet
-
important intersections in road networks
Strengths
✔ Handles huge networks
✔ mathematically elegant (Markov chain steady state)
✔ resistant to simple manipulation
✔ scalable with iterative methods
🔟 Limitations
✖ ignores page content quality
✖ susceptible to link farms (partially mitigated)
✖ slow convergence for massive graphs
✖ modern search engines use many additional signals
Summary
PageRank is a link-analysis algorithm that assigns importance to nodes in a network using the stationary distribution of a Markov chain based on link structure.PageRank models web navigation as a Markov process and assigns importance scores based on the steady-state visit probability, making it a powerful method for ranking nodes in large networks.
✅ Python Code — Basic PageRank (Iterative Method)
This example computes PageRank for a small web graph.
📌 Graph structure
-
A → B, C
-
B → C
-
C → A

Comments
Post a Comment