Information about Robot Access to our Pages

Robot access to our pages puts a severe strain on our server due to the sheer amount of pages (15.000+) that make up our site, most of which are generated from our database on-the-fly. Therefore we ask you to follow these rules if you use a robot to access our site:

Rules

Getting All EBook Files

You can get all our eBooks in zipped files by pointing your robot at

http://pge.rastko.net/robot/harvest

You will also get all our mp3 files, which we do not zip.

Here is an estimate of the data volume: (Nov 2004)

TypeFilesGBEstimated download time DSL 1MBit/s
zip24,16014.532 hours
mp312,86591.58.5 days

Unpacking the zip files will get you another 70,000 files.

This is an example of how to get all files using wget:

wget -m http://pge.rastko.net/robot/harvest

wget is free software and available for Linux and Windows at www.gnu.org/software/wget/.

If you don't want the mp3 files, say:

wget -m -R "mp3" http://pge.rastko.net/robot/harvest

If you want only some types of files say:

wget -m http://pge.rastko.net/robot/harvest?filetypes[]=txt&filetypes[]=html

Replace txt and html with the file types you are interested in.

Mirroring EBook Files

If you want to harvest our eBook files on a regular basis, eg. to maintain a mirror site, read the mirror howto. It explains how to use rsync or wget to do this.

Getting Catalog Data

If you are harvesting our pages just to extract catalog data, you are wasting both your time and our resources. You can get the data much easier if you just grab the Project Gutenberg Europe catalog in machine-readable format. The catalog data is licensed under the GNU GPL.

N.B. The Project Gutenberg Europe web site is copyrighted. You are not allowed to use any data you harvest directly from the web site for anything except personal use. This is another good reason to grab the machine-readable catalog instead.

Other Reasons

If you are harvesting our site for other reasons, consider contacting the webmaster instead. We can help out with a better solution in most of the cases.