Finally a clean-room implementation of Google’s Page Rank Algorithm in Java, reverse-engineered from their numerous commentary on Page Rank.
public static int getPageRank(url) {
// start off with a random low PR
int pageRank = rand.getInt(0, 3);
if ( isHostedOn(‘google.com’, url) ) {
pageRank++;
} else if ( isHostedOn(‘microsoft.com’, url) ) {
pageRank–;
}
// Support valid pages
if (isValidPage(url) ) {
pageRank += 1;
}
tag_value['b'] = 1;
tag_value['h2'] = 2;
tag_value['h1'] = 3;
tag_value['strong'] = -1; // W3C sux!
pageRank = calculateTagsPR(tag_value, pagerank);
// Sergey said good news sites have
// lots of nested tables
tablesOnPage = getTagCount(‘table’);
if (tablesOnPage >= 50) {
pageRank += 2;
}
if (pageRank >= 5) {
pageRank = 4; // helps selling AdWords
}
if (linksFrom(‘mattcutts.com’, url) >= 4) {
// I link to “clean” sites only
// ? Matt, Feb 2006
pagerank += 2;
}
pagerank += countBacklinks(url) / 10000;
blacklist1 = getList(‘c:chinese-government-censored.txt’);
blacklist2 = getList(‘c:larry-page-hatelist.txt’);
if ( inArray(blacklist1, url) || inArray(blacklist2, url) ) {
pageRank = 0;
}
d = dashesInUrl(url);
pageRank = (d >= 3) ? pageRank -1 : pageRank + 1;
if (inString(url, “how to build a bomb”)) {
// added on request. 2004-12-01.
recipient = “peter@homelandsecurity.gov”;
subject = “You might wanna check this…”;
sendMailTo(recipient, subject, url);
// page might still be relevant
pageRank++;
}
if (month() == “June” || month() == “October”) {
// makes people talk about
// PR updates, good publicity
pagerank -= randomNumber(1,3);
}
if (checkIdenticalPageAndLinkColor) {
// spammer!! Googleaxe it!!
pagerank = 0;
}
if (url == “http://www.nytimes.com”) {
// just testing, pls remove tomorrow
// ? Frank, June 2003
pagerank = 10;
}
//Don’t show PR above 10
if(pagerank > 10) pagerank = 10;
return pagerank;
}
Modified (to Java and added normalization etc.) from idea and original code by Jack Tang.
Related Posts: On this day...
- Dropbox's new security policy implies that they lied about privacy from the start - 2011
- Landing page for blocked sites in the UAE - 2010
- Pirate Bay judge had conflict of interest: Mistrial? - 2009
- You're Participating In The Facebook Terms Of Service Vote, Right? - 2009
- OLPC to scrap Linux for Windows - 2008


Music















I like the helpful info you supply to your articles. I will bookmark your weblog and check once more here regularly. I am relatively certain I?ll be told plenty of new stuff proper here! Best of luck for the next!