One of the steps used by the attacker who compromised a friend’s Blog a few weeks ago was to create an account (which he promoted to administrator). I quickly disabled the account, but while doing forensics, I thought it would be interesting to find out the account password. WordPress stores raw MD5 hashes in the user database (despite many recommendations to use salting). As with any respectable hash function, it is believed to be computationally infeasible to discover the input of MD5 from an output. Instead, someone would have to try out all possible inputs until the correct output is discovered.
So, I wrote a trivial Python script which hashed all dictionary words, but that didnâ€™t find the target (I also tried adding numbers to the end). Then, I switched to a Russian dictionary (because the comments in the shell code installed were in Russian) but that didnâ€™t work either. I could have found or written a better password cracker, which varies the case of letters, and does common substitutions (e.g. o -> 0, a -> 4) but that would have taken more time than I wanted to spend. I could also improve efficiency with a rainbow table, but this needs a large database which I didnâ€™t have.
Instead, I asked Google. I found, for example, a genealogy page listing people with the surname â€œAnthonyâ€, and an advert for a house, signing off â€œPlease Call for showing. Thank you, Anthonyâ€. And indeed, the MD5 hash of â€œAnthonyâ€ was the database entry for the attacker. I had discovered his password.
In both the webpages, the target hash was in a URL. This makes a lot of sense â€” Iâ€™ve even written code which does the same. When I needed to store a file, indexed by a key, a simple option is to make the filename the keyâ€™s MD5 hash. This avoids the need to escape any potentially dangerous user input and is very resistant to accidental collisions. If there are too many entries to store in a single directory, by creating directories for each prefix, there will be an even distribution of files. MD5 is quite fast, and while itâ€™s unlikely to be the best option in all cases, it is an easy solution which works pretty well.
Because of this technique, Google is acting as a hash pre-image finder, and more importantly finding hashes of things that people have hashed before. Google is doing what it does best â€” storing large databases and searching them. I doubt, however, that they envisaged this use though.
Related Posts: On this day...
- Canada's new plastic $100 bill is all tricked out - 2011
- U.S. Marshals improperly retained scanner images - 2010
- Google CEO Sees Android Phones Replacing Credit Cards - 2010
- Linux installed in 90% of Top 500 Supercomputers - 2009
- The TSA Will Not Like Your Luggage - 2008
- Photo of the Day - 2007
- The Vice Fund - 2007