| Sunday November 23rd 2014

Feedburner

Subscribe by email:

We promise not to spam/sell you.


Search Amazon deals:

Clean Room Implementation of Google Page Rank Algorithm


google-page-rank.png

Finally a clean-room implementation of Google’s Page Rank Algorithm in Java, reverse-engineered from their numerous commentary on Page Rank.

public static int getPageRank(url) {
// start off with a random low PR
int pageRank = rand.getInt(0, 3);

if ( isHostedOn(‘google.com’, url) ) {
pageRank++;
} else if ( isHostedOn(‘microsoft.com’, url) ) {
pageRank–;
}
// Support valid pages
if (isValidPage(url) ) {
pageRank += 1;
}


tag_value['b'] = 1;
tag_value['h2'] = 2;
tag_value['h1'] = 3;
tag_value['strong'] = -1; // W3C sux!
pageRank = calculateTagsPR(tag_value, pagerank);

// Sergey said good news sites have
// lots of nested tables
tablesOnPage = getTagCount(‘table’);
if (tablesOnPage >= 50) {
pageRank += 2;
}

if (pageRank >= 5) {
pageRank = 4; // helps selling AdWords
}

if (linksFrom(‘mattcutts.com’, url) >= 4) {
// I link to “clean” sites only
// ? Matt, Feb 2006
pagerank += 2;
}

pagerank += countBacklinks(url) / 10000;

blacklist1 = getList(‘c:chinese-government-censored.txt’);
blacklist2 = getList(‘c:larry-page-hatelist.txt’);
if ( inArray(blacklist1, url) || inArray(blacklist2, url) ) {
pageRank = 0;
}

d = dashesInUrl(url);
pageRank = (d >= 3) ? pageRank -1 : pageRank + 1;

if (inString(url, “how to build a bomb”)) {
// added on request. 2004-12-01.
recipient = “peter@homelandsecurity.gov”;
subject = “You might wanna check this…”;
sendMailTo(recipient, subject, url);

// page might still be relevant
pageRank++;
}

if (month() == “June” || month() == “October”) {
// makes people talk about
// PR updates, good publicity
pagerank -= randomNumber(1,3);
}

if (checkIdenticalPageAndLinkColor) {
// spammer!! Googleaxe it!!
pagerank = 0;
}

if (url == “http://www.nytimes.com”) {
// just testing, pls remove tomorrow
// ? Frank, June 2003
pagerank = 10;
}

//Don’t show PR above 10
if(pagerank > 10) pagerank = 10;

return pagerank;
}

Modified (to Java and added normalization etc.) from idea and original code by Jack Tang.

Related Posts: On this day...

Reader Feedback

One Response to “Clean Room Implementation of Google Page Rank Algorithm”

  1. Sam says:

    I like the helpful info you supply to your articles. I will bookmark your weblog and check once more here regularly. I am relatively certain I?ll be told plenty of new stuff proper here! Best of luck for the next!

Leave a Reply

You must be logged in to post a comment.