Page Rank Algorithm

November 03, 2025

Page Rank Algorithm

PageRank is an algorithm originally developed at Google to rank web pages in search results.

It measures the importance of a page based on the links pointing to it.

Core Idea (Simple Intuition)

A webpage is important if:

many pages link to it, and
those linking pages are themselves important.

👉 This creates a recursive importance score.

Everyday analogy

Think of academic papers:

A paper cited by many important papers is influential.
Not all citations count equally.

PageRank works the same way for web pages.

Random Surfer Model (Most intuitive view)

Imagine a user:

Starts on a random webpage
Clicks a random link on the page
Repeats forever

Sometimes the user:

gets bored and jumps to a random page instead

The PageRank of a page = probability the surfer is on that page after a long time.

Mathematical Formulation

Suppose there are $N$ pages.

PageRank of page $i$ :

PR(i)=\frac{1-d}{N}+d\sum_{j\in M_i}\frac{PR(j)}{L(j)}

Meaning of symbols

Symbol	Meaning
$d$	damping factor (usually 0.85)
$M_i$	pages linking to page $i$
$L(j)$	number of outgoing links from page $j$
$N$	total number of pages

Interpretation

$(1-d)/N$ → random jump probability
second term → importance passed through links

Matrix Form (Markov Chain View)

Let:

\mathbf{PR}=dP^T\mathbf{PR}+\frac{1-d}{N}\mathbf{1}

where:

$P$ = transition probability matrix
PageRank = stationary distribution of a Markov chain

This is why PageRank connects to:

Markov chains
eigenvectors
steady-state probability

Simple Example

Link structure

A → B, C
B → C
C → A

Initially:

PR(A)=PR(B)=PR(C)=1/3

After iterations:

C receives links from A and B → score increases
A receives from C → moderate
B receives only from A → smaller

Eventually values stabilize.

This stable vector = PageRank.

Algorithm Steps

Initialize ranks equally:

PR_i=1/N

Update ranks using formula
Repeat until convergence
Output final ranks

This is essentially power iteration.

Why PageRank Works

Without PageRank:

pages with many links dominate (even spam)

With PageRank:

links from important pages count more
prevents simple manipulation

It measures quality of links, not just quantity.

Applications of PageRank

🌍 1. Web Search Ranking

Original use:

rank webpages by importance
combined with content relevance

📱 2. Social Network Analysis

Used to find:

influential users on Twitter/X
important Instagram accounts
key nodes in communication networks

📚 3. Citation Analysis

Used in research:

rank scientific papers
find influential authors
journal impact measurement

🧬 4. Biology & Medicine

Applied to:

protein interaction networks
gene importance ranking
disease propagation modeling

🛒 5. Recommendation Systems

Variants used for:

product recommendation
movie ranking
content importance scoring

🚦 6. Network Reliability & Traffic Flow

Helps identify:

critical routers in internet
important intersections in road networks

Strengths

✔ Handles huge networks
✔ mathematically elegant (Markov chain steady state)
✔ resistant to simple manipulation
✔ scalable with iterative methods

🔟 Limitations

✖ ignores page content quality
✖ susceptible to link farms (partially mitigated)
✖ slow convergence for massive graphs
✖ modern search engines use many additional signals

Summary

PageRank is a link-analysis algorithm that assigns importance to nodes in a network using the stationary distribution of a Markov chain based on link structure.

PageRank models web navigation as a Markov process and assigns importance scores based on the steady-state visit probability, making it a powerful method for ranking nodes in large networks.

✅ Python Code — Basic PageRank (Iterative Method)

This example computes PageRank for a small web graph.

📌 Graph structure

A → B, C
B → C
C → A

import numpy as np
import matplotlib.pyplot as plt

# ----- Transition matrix -----

# Order: A, B, C

P = np.array([
    [0,   0,   1],    # A receives from C
    [1/2, 0,   0],    # B receives from A
    [1/2, 1,   0]     # C receives from A and B
], dtype=float)

# ----- Parameters -----
N = P.shape[0]
d = 0.85                     # damping factor
iterations = 50

# ----- Initialize ranks equally -----
rank = np.ones(N) / N

# ----- Power iteration -----
for _ in range(iterations):
    rank = d * P @ rank + (1-d)/N

# ----- Print results -----
pages = ['A','B','C']
for p, r in zip(pages, rank):
    print(f"Page {p} rank: {r:.4f}")


pages = ['A','B','C']
plt.bar(pages, rank)
plt.title("PageRank Scores")
plt.ylabel("Rank value")
plt.show()

OUTPUT

Page A rank: 0.3878
Page B rank: 0.2148
Page C rank: 0.3974

Search This Blog

Advanced Mathematics for Computer Science HNCST409 KTU BTech Honors 2024 Scheme- Dr Binu V P