The Project Gutenberg FAQ - VV-SS

Suzanne Shell

Over the past several years, I visited the Project Gutenberg website occasionally, looked at what was involved in making a significant contribution to the effort, and left after downloading a few books--PG was a project that would need to wait until I retired.

In the summer and fall of 2002, I was doing research on e-books (sources, devices, costs) for my library, and ran across Distributed Proofreaders. I discovered at about this time, and also followed a link from there to Distributed Proofreaders. Serendipity! After backing away a few times, I took the plunge and registered on November 5, then began proofing. The however-many-pages-I-wanted-to-proof commitment was just right for letting me get a feel for the process, and to start me thinking of the ways I could exploit all this free labor to get the books I wanted into PG.

I was feeling quite virtuous about proofing my 10-20 pages per day, when I visited the site on November 8, and NONE of the books I was working on were available. Also there was this perfectly absurd number listed for number of proofers having proofed at least one page (it had roughly quadrupled). I KNEW the site had been hacked. Actually the site had been slash dotted. The DP discussion forums were so active, it was hard to find time to read all the messages, questions, suggestions, and complaints; these rapidly led to new documentation and more detailed proofing guidelines. Books moved through the site so rapidly that they brought out the "hard stuff" from the bottom of the to-do stack, and were STILL desperate for content. I was a relative "veteran" after just a few days, and helped out a little by answering questions, but I was still a beginner. I had some PG dreams that DP could make reality, but I needed to learn the ropes first.

Some of my ambitions revolved around professional goals--there are some public domain titles, which, if available in electronic form, would be extremely useful to my library's patrons. There are also some standard reference books and indexes--Granger's Index to Poetry is one example--that have pre-1923 editions that could still be important resources. In order to learn what I needed to know about providing content, though, I decided to start with something less overwhelming (wanting to read it on my e-book reader was just a coincidence). I went to my bookshelves and pulled out my P. G. Wodehouse reprints. I downloaded and read the scanning and submitting FAQ from the DP site, requested and received clearance for the first book (Uneasy Money) in late December, and got to work mastering my scanner. I tried Omnipage Pro first, but decided that ABBYY Finereader Pro did a significantly better job of the OCR. I offered to be a "behind the scenes" manager for the book while it worked its way through the site, but was made an official "Project Manager" instead. Although the first frenzy following the slash dot invasion had calmed down, DP was still feeling a need for more content and more hands to manage projects.

On January 5, Uneasy Money started proofing; it went through 2 rounds of proofing in less than 20 hours. I felt a like a hick marveling at a traffic light changing colors, but I sat at my PC and watched the page count go down. By this time, I had also scanned and OCR'd a couple more Wodehouse reprints and a short book of poetry. I was hooked! Juliet Sutherland and the other admins had recruited some experienced DP'ers to help train new post-processors in the job of preparing final PG texts. I was handed over to one of them. After several projects, I "graduated" and was given permission to upload my own projects. My intent was to do 3 or 4 projects a month, no more than I could handle post-processing by myself. I planned to process an occasional reference book in addition to all the Wodehouse I could get my hands on. So much for plans...

One ongoing concern of many Distributed Proofreaders was how to train new volunteers in the DP style of proofreading. (It is somewhat idiosyncratic because of the distributed nature of the process.) We were still coping with the aftereffects of the massive influx of slash dotters--quantity benefited, but quality suffered. Super7, one of the highest volume proofreaders, suggested setting aside a project without complex formatting for "Beginners" and asking that the second round proofers (all of whom should be veterans) send feedback and encouragement to the newcomers. This was tried successfully, and with a couple of variations. Since I had been planning to start running a variety of genre fiction through the site, I then volunteered to manage these as beginners' projects for as long as the supply held out. All of a sudden, starting in February 2003, the amount of time I needed to spend locating, scanning, OCR'ing and managing books increased drastically, and the amount of time I could devote to post-processing decreased. Luckily, "veterans" stepped in to answer newcomers' questions, and to serve as "Mentors" in the second round of proofing. Recently, others have provided "beginners' projects", to help keep up with the demand of a steadily increasing flow of new volunteers. These projects are also useful for helping new post-processors learn the job.

I still have some ambitious projects planned; Granger's Index to Poetry, the unabridged edition of The Golden Bough, Curtis' The North American Indian, and the Book Review Digest (volumes for 1905-1921). A couple of volumes are already waiting to be proofed, others are waiting to be scanned on the PG tabloid scanner. But, in the meantime, there are 23 new Wodehouse books in PG thanks to Distributed Proofreaders, not to mention such remnants of early 20th century popular culture as The Sheik.

I believe that a major accomplishment of Distributed Proofreaders has been the creation of way to provide on-the-job training for PG volunteers. Steady improvement in the quantity and quality of training techniques and documentation, enhancements to the user-friendliness of the site, and ready access to the collective experience and advice of a wide range of volunteers in the Forums have resulted in a growing core of active and experienced volunteers in all the facets of e-book production. I'm sure that I could not have progressed from a total newbie to a regular PG contributor within a 5-month period without this support structure. Regular communication and collaboration with book-lovers from around the world has enriched my life. The fact that it is easier to get leave from my job than from DP, is perhaps beside the point...