Project Gutenberg eBooks are created by volunteers. This HOWTO contains some of the basics to get started in becoming a volunteer. Please read the FAQ, visit Distributed Proofreaders, or subscribe to a mailing list for more detail. Every eBook is different, and presents its own challenges.
Copyright. The first step in any eBook submission is to confirm that Project Gutenberg may legally distribute the eBook. Visit our Copyright HOWTO for details. Project Gutenberg will distribute no eBook without confirming copyright status.
File Formats. Whenever possible, Project Gutenberg distributes a plain text version of an eBook. Other formats, such as HTML, XML, RTF, and others are also welcome, but plain text is the "lowest common denominator." We stress the inclusion of plain text because of its longevity: Project Gutenberg includes numerous text files that are 20-30 years old. In that time, dozens of widely used file formats have come and gone. Text is accessible on all computers, and is also insurance against future obsolescence.
Insistence on plain text can be a problem for harvesting eBooks (see below), but is still a firm requirement. The only times when Project Gutenberg distributes an eBook without a plain text version are when plain text is impossible or impractical -- for example, for our movies and MP3 audio files, and for some of our mathematical works.
Creating an eBook. Turning a physical book into an eBook is a wonderful way to preserve the book, and to make it more widely available. Historically, eBook creation was accomplished by a single person typing in the physical book, a page at a time. This technique still works, of course, and is sometimes necessary (for example, if the book is damaged or extremely fragile).
A more likely scenario these days is for the physical book to be scanned. Distributed Proofreaders runs high-speed page scanners, but must chop the spine from the book to scan it. Individuals are more likely to have a flatbed scanner which is slower, but doesn't require damaging the book.
After scanning, OCR software (Optical Character Recognition) is used to turn the page images into text that you can edit. Scanners come with OCR software, and accuracy in excess of 99% is common. Fixing those 1% or so of errors is the most time consuming part. Proofreading is the process of comparing the OCR output to the physical book, fixing problems.
Once the OCR is proofread, it needs to be formatted as a completed eBook. Sometimes, formatting is ongoing during proofreading. To create a plain text eBook, use tools such as GutCheck to look for common OCR errors and formatting problems. Plain text eBooks should have line wraps at 72 characters and skip a line between paragraphs with no indentation. For many more details, see the FAQ.
Other formats are sometimes easily produced from the OCR software, or may be created using a word processor or other tool. For HTML, Project Gutenberg insists on valid HTML (per the Validator, at validator.w3.org. This can be very challenging if you use automated tools to create the HTML -- it's best to take the advice of other Project Gutenberg volunteers and choose your HTML authoring method carefully.
It's fine to keep scanned images with your HTML (or even with the plain text, which can be zipped together with its images). Images should be cropped, and saved in PNG, GIF or JPEG format. Don't try to keep the highest resolution possible -- for HTML, we typically presume 96 DPI (dots per inch) with 256 colors, which is a "medium" resolution.
Generally, the eBook should all be in one file (or one file plus image files). This makes it easier to be displayed as a whole.
Harvesting. There are many eBooks on the Internet and elsewhere that are not part of the Project Gutenberg collection. Submissions of these eBooks is welcome! The two main challenges, as mentioned above, are copyright and formatting. Copyright is often a challenge because a harvested eBook might not have sufficient detail to perform a copyright clearance. In such a case, the eBook needs to be compared to a known public domain physical book (more in the Copyright Confirmation HOWTO).
Similarly, formatting could be an issue due to the lack of plain text, non-valid HTML, or many files that need to be re-assembled into a unified text or HTML.
If public domain status is confirmed for an eBook, that means that the eBook is free for any use. (To make sure the eBook consists only of public domain materials, any trademarks or other new items -- such as newly created graphics -- should be removed.) The procedure for a harvested item is that Project Gutenberg (or you, the Project Gutenberg volunteer) will seek permission from the eBook's source to include the eBook in the collection. If permission is granted, we can also include a credit line such as an attribution and URL.
In cases where an answer is not received, or permission is not granted, the eBook may still be used without attribution (as long as it's confirmed to be in the public domain by the Project Gutenberg copyright team). Asking permission is a courtesy, as is including a credit line. But if permission is not forthcoming, the public domain status of the item prevents any limitation on its use, either by the person or organization that created the item, or subsequent distributors.
Of course, the discussion in this section also pertains to people who have created eBooks for other purposes or projects. We welcome these submissions!
Where to Submit the eBookVisit our submission page to submit the eBook's file(s). You will be prompted for the copyright clearance data received from the copyright team. If you have difficulty with the form, or need to use an alternate method to submit (such as FTP or email), send a note to the submission team to arrange details.
THANKS for considering an eBook submission to Project Gutenberg!