Stop Image Harvesters Like HyperImage From Downloading All Your Images

News flash: not everyone thinks they should have to pay to use the great images you have on your website. And to make it easier for them to grab your work there are tools that download all the images on a website at a click of a button.

One such tool is HyperImage (H/T Steve Skoll for pointing them out). From their website:

HyperImage is an industrial-strength tool for searching the web and downloading entire websites worth of pictures. Just enter an address or keyword, and watch as thousands of pictures stream down to your computer.

hyperimage downloading imagesUnfortunately this is nothing new and there are plenty more besides HyperImage. They crawl up and down static links to all corners of your site (and to any other site you link to) and download any image that renders in the browser: logos and icons but also your thumbnails, large img files, background images, slideshow images, etc. I just downloaded 794 images from a photographer’s website. Took me under a minute.

Tweak Your .htaccess File

The good news is you can block them with an .htaccess file placed on the root folder of your website. This file will detect the unwelcome visitor and send them away. Modifying an .htaccess file is a very simple task BUT get it wrong and it can really mess up your website. So take the utmost care doing this – and don’t hold me responsible.

.htaccess file on a serverAn .htaccess file is a simple ASCII text file that provides a way to make configuration changes on a per-directory basis. It can do many things: password protect a folder, redirect users automatically, direct to custom error pages, change file extensions, ban or allow users with certain IP addresses, stop directory listings and use a different file as the index file.

Most hosting providers support .htaccess but some don’t publicize it much and some won’t allow you to use it at all. If your server runs Unix or Linux, or any version of the Apache web server it will support .htaccess, although your host may not. Search on their help pages or give them a call.

Watermark Your Images

Of course if you don’t have access to the server hosting your site as is the case with WordPress.com or PhotoShelter sites you can’t modify the .htaccess. The fallback solution is to watermark your images – something that should be done anyway for all archives images and possibly on portfolio images as well (more on that someday). You want HyperImage to download the watermarked image so your watermark needs to be part of the image itself, not overlayed by the browser. Thankfully PhotoShelter serves up ‘real’ watermarked images (PhotoShelter instructions on watermarking). Now not all photographers will agree to use watermarks and not all images can reasonably be watermarked. I suspect PhotoShelter will be taking steps to block ‘bad bots’ pretty quickly if there is enough consensus among users. The consensus is needed because these image harvesters are not actually doing anything illegal. They are simply extreme facilitators of copyright infringement.

Back to your server. Before you create a new .htaccess file check to see if you already have one in your site’s root directory on your server (make sure your FTP client is showing system files). You will probably need to edit the permissions for that file in order to edit it. Download it to your computer and place a copy on your desktop as a safety net.

Open it in a plain text editor that doesn’t use word wrap or in a code editor like Dreamweaver. You don’t want the application to insert special ASCII codes to signify a line break or save the file with any other extension.

In your plain text editor type the following:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^HyperImage 
RewriteRule ^.* - [F,L]

The first line turns on the rewrite engine in Apache, this allows you to redirect the user’s request. The second line sets a condition using RewriteCond. In this case you want to detect visitors with user agent HyperImage. The third line sends a 403 forbidden error to user (F) and tells the engine to stop rewriting so no other rules are applied (L).

Save the file with filename .htaccess. If you can’t shake off an unwanted extension added by your editor name it htaccess.txt and rename it with your FTP client once you have uploaded it to the server.

permissions or CHMOD settingsPlace the file in your root directory so that it affects your entire site. Change file permissions (CHMOD) back to 644 (RW- R– R–) or whatever it was originally (on my server it was 604) in order to make the file usable by the server, but prevent it from being read by a browser.

While you are at it you can deny access to many more unwanted visitors: email harvesters, offline browsing programs (site rippers like HyperImage), spammers. Typically they ignore robots.txt rules, which is plain rude, but also allows you identify and trap them. Identifying them by their user agent (as we have done for HyperImage) is less reliable as bad bots will often fake their user agent. Here is one list of bad bots and here is a discussion of the “perfect .htaccess ban list”. In the end choosing who to ban is a matter of personal choice.

TOOLS I USE (affiliate links)
Tools I use: WPEngine WordPress hosting Tools I use: PhotoShelter Tools I use: RoyalSlider WordPress plugin Tools I use: 1Password for secure passwords and logins Tools I use: Namecheap for domain name registration