So my girlfriend tells me that I need to download several large image files from a photographer’s website. I manage her online portfolio/website so I’m used to these types of requests… but this time I was going to find a better way to “leech them all.” So anyway, she IM’s me the URL for her most recent shoot…
I was greeted by a nice default Apache index page (and the Photog spelled her name wrong, ugh).
How am i going to grab about 70, five or seven megabyte, image files?
I could click each one and then save it or I could use the Unlinker Firefox Add-On to convert all the links to images. The latter would load all 350MB of photos on the one page. Most certainly my FX-55 single core processor and 1gb of DDR RAM wouldn’t appreciate that very much.
Being the huge open source fan that I am I decided to write a Bash script to accomplish this without hogging up all my computer’s resources. If you manage to know the first image filename and the last image filename in a particular folder, you can download them using seq command with a Bash do loop. Let’s say the first image and the last image’s name is in this format:
we can assume the images between them should be 0020, 0021, 0022, and so on, until 1214. Therefore a simple Bash script will looks like this:
for i in `seq -f"%04g" 19 1214`
wget -c "http://photographer.com/jenn_thomas/full_size/JT_$i.jpg"
Seq allows you to define printf-like formating by specified with -f”%04g” is actually tells seq I got four digits, fill the blank digits with 0, and the range is from 19 to 1214. After that, use wget to download them. That’s how I got JT_0353.jpg at the top of this post. Pretty simple isn’t it?
You can run Bash scripts under a windows platform too if you have Cygwin installed. But bare in mind, not all images are download-able with this technique. Certain site pad the image’s filename with some random characters, that prevent downloads by this simple script.
UPDATE: A reader suggested using Curl as an alternative:
curl -o JT_01_#1 http://photographer.com/jenn_thomas/full_size/JT_[0019-1214].jpg
Related Posts: On this day...
- The Behavior Gap: Simple Ways to Stop Doing Dumb Things with Money - 2012
- ATM skimmer that doesn't require any modifications to the ATM - 2011
- Phone-to-Twitter bridge for use in an Internet-less Egypt - 2011
- All The Many Ways Amazon So Very Failed the Weekend - 2010
- Ubuntu Linux kernel vulnerabilities - 2009
- Chrome growth slows to a crawl - 2009
- Bash Batch Image Processing Script - 2008
- What can be done after ARP poisoning? - 2008