Robot access to our pages puts a severe strain on our server due to the sheer amount of pages (15.000+) that make up our site, most of which are generated from our database on-the-fly. Therefore we ask you to follow these rules if you use a robot to access our site:
Make sure that your robot obeys /robots.txt. (You may want to read the manual that came with your robot to know how to do this.)
Don't robot during the peak hours: Mo-Fr, 10AM-18PM EDT.
You can get all our eBooks in zipped files by pointing your robot at
You will also get all our mp3 files, which we do not zip.
Here is an estimate of the data volume: (Nov 2004)
|Type||Files||GB||Estimated download time DSL 1MBit/s|
Unpacking the zip files will get you another 70,000 files.
This is an example of how to get all files using wget:
wget -m http://pge.rastko.net/robot/harvest
wget is free software and available for Linux and Windows at www.gnu.org/software/wget/.
If you don't want the mp3 files, say:
wget -m -R "mp3" http://pge.rastko.net/robot/harvest
If you want only some types of files say:
wget -m http://pge.rastko.net/robot/harvest?filetypes=txt&filetypes=html
Replace txt and html with the file types you are interested in.
If you want to harvest our eBook files on a regular basis, eg. to maintain a mirror site, read the mirror howto. It explains how to use rsync or wget to do this.
If you are harvesting our pages just to extract catalog data, you are wasting both your time and our resources. You can get the data much easier if you just grab the Project Gutenberg Europe catalog in machine-readable format. The catalog data is licensed under the GNU GPL.
N.B. The Project Gutenberg Europe web site is copyrighted. You are not allowed to use any data you harvest directly from the web site for anything except personal use. This is another good reason to grab the machine-readable catalog instead.
If you are harvesting our site for other reasons, consider contacting the webmaster instead. We can help out with a better solution in most of the cases.