What you actually need to do to produce a PG text can be stated very simply:
1. Borrow or buy an eligible book.
2. Send us a copy of the front and back of the title page, and wait for an OK.
3. Turn the book into electronic text.
4. Send it to us.
That's it! All the rest of the producing parts of the FAQ are about the details of how different people approach these steps.
Different people find their own ways into PG work, and once in, find their own niches. If you have your own ideas, don't let anything here stop you from pursuing them.
Most people now start by registering at Distributed Proofreaders http://www.pgdp.net, or Distributed Proofreaders Europe http://dp.rastko.org, proofing a page or two, and go on from there.
Some people just read the FAQs, go up to their attic, pull an eligible book off the shelf, send TP&V [V.25] in, and start typing or scanning. Next time we hear from them is when they send in [V.46] the completed eBook for posting. It can be as simple as that.
Some people just download existing PG texts, re-proof them very carefully and send in corrections.
Some people find regular collaborators through gutvol-d or the distributed proofing sites, earn a reputation as reliable proofers, and continue working as proofers.
Most people start small, and after a little experience of distributed proofreading or other proofing, begin their PG career as producers.
If you're a typist, cheer now, because you can ignore all the complicated paraphernalia of computer interfaces, and scanners, and the quality of OCR software and the mistakes it makes. You can just sit down at the keyboard with your eligible [V.18] book.
If you're not a typist, start thinking about scanners. It may be a while before you're ready to start scanning for yourself, but it's never too early to find out about them.
As soon as you have a solid grasp of how to turn a book into an etext, please start thinking about how you're going to become a producer. While proofing work is valuable, PG can only add books when someone makes the effort to actually make etexts from them, and the people who run distributed and co-operative proofing projects have to do a lot of work before and after the proofing step; we want to spread that around as widely as possible. Project Gutenberg needs more producers!
Whatever you do, if you are working outside a Distributed Proofing project, don't just hang around expecting someone to offer you a task to undertake. There is no "head office" where overworked staff occasionally need interns to do filing and odd-jobs. There are maybe 200 fairly regular contributors to PG, producers and significant proofers. We almost never meet each other in person. We have jobs, and families, and other interests. We work for PG when we can, and when we want to. In many ways, you could look at us as 200 unrelated people, each doing our own etext project, using Project Gutenberg as an umbrella group that sets loose standards, files copyright proofs and provides secure placement for the finished texts. Since we each have our own self-assigned single-person tasks, there isn't too much room to delegate some of that work to a beginner. By all means, volunteer for some tasks--on the Volunteers' Board, or in gutvol-d--but you should think in terms of defining your own tasks, and making your own contribution.
Orientation.
Absolutely everyone--scanners, typists, proofers--should first spend some time working on a distributed or co-operative proofing project. This will allow you to get a feel for what happens in making an etext from paper pages without committing you to more than a few hours' work.
This is not in any way an institutional requirement, since we don't have any institutional requirements, but it is very good advice. Many volunteers start eagerly, wanting to do lots of PG work, and then drop out because they took on too much, too fast, without understanding the nature of the work. Don't let that happen to you. Take it in small chunks.
Check out these distributed proofing sites:
| PG's Distributed Proofreaders: | <http://www.pgdp.net/> |
| DP-Europe: | <http://dp.rastko.net> |
and spend a few hours over a couple of weeks just processing some pages for real.
These two sites are very similar, and either may tackle any text, but while the original Distributed Proofreaders produces texts in all languages, a large part of its production is in English. While Distributed Proofreaders of Europe produces texts in English, it concentrates largely on other Western and Eastern European languages.
While you're doing that, you should also join a couple of PG mailing lists [V.12]--gutvol-d and either the weekly or monthly Newsletter list. Reading these will start to get you connected to what's going on. Browse the Volunteers' Board--there may be some offers going, and there's a lot of experience captured in some of those "back-issues", so don't confine yourself to the front page.
Inform yourself on e-text issues generally, not just within Project Gutenberg. Explore The On-Line Books Page [R.5] and find other eBooks available on-line.
Have a look at our In-Progress List and some lists of suggestions from others [B.4].
Look at sites like Blackmask <http://www.blackmask.com> and Pluckerbooks <http://www.pluckerbooks.com/> Memoware <http://www.memoware.com> and Bookshare <http://www.bookshare.org> to learn how our work is being used as a basis and copied and converted and amplified in many other projects.
Above all, read a few Project Gutenberg eBooks! You don't have to read them in full; you don't need to spend weeks poring over Dostoyevsky or studying Shakespeare. Just download a few and skim them--you'll absorb what a PG text should be quite painlessly, and maybe you'll get caught up in the story! If you're looking for light reading, and can't think of something that you specifically want, how about these all-time favorites:
The Gift of the Magi, by O. Henry.
The Lady, or the Tiger?, by Frank R. Stockton
A Christmas Carol, by Charles Dickens
Alice in Wonderland, Lewis Carroll
Anne of Green Gables, by Lucy Maud Montgomery
The Marvelous Land of Oz, by L. Frank Baum
A Princess of Mars, by Edgar Rice Burroughs
Heidi, by Johanna Spyri
A Connecticut Yankee in King Arthur's Court, by Mark Twain
Black Beauty, by Anna Sewell
Tarzan of the Apes, by Edgar Rice Burroughs
Tom Swift and his Motor-Cycle, by Victor Appleton
Rebecca Of Sunnybrook Farm, by Kate Douglas Wiggin
Little Lord Fauntleroy, by Frances Hodgson Burnett
Aesop's Fables
Grimms' Fairy Tales
The Art of War, by Sun Tzu
Dracula, by Bram Stoker
Swiss Family Robinson, by Johann David Wyss
The War of the Worlds, by H.G. Wells
If you have a taste for detectives and mysteries, there's
The Adventures of Sherlock Holmes, by Arthur Conan Doyle
Monsieur Lecoq, by Emile Gaboriau
The Mysterious Affair at Styles, by Agatha Christie
Arsene Lupin, by Edgar Jepson & Maurice Leblanc
Edgar Allen Poe's "The Gold-Bug" and
"The Murders in the Rue Morgue" in The Works of Edgar Allan Poe V. 1
For the excessive buckling of various swashes, see:
The Prisoner of Zenda, by Anthony Hope
The Man in the Iron Mask, by Dumas, Pere
The Three Musketeers, by Alexandre Dumas
Treasure Island, by Robert Louis Stevenson
The Scarlet Pimpernel, by Baroness Orczy
Effen youse got a hankerin' for a Western, there's:
Riders of the Purple Sage, by Zane Grey
The Virginian, Horseman Of The Plains, by Owen Wister
Back to God's Country, By James Oliver Curwood
Selected Stories by Bret Harte
Jean of the Lazy A, by B. M. Bower
Or if you prefer your fiction more domesticated, there's:
Little Women, by Louisa May Alcott
Pride and Prejudice, by Jane Austen
The Warden, by Anthony Trollope
The Heir of Redclyffe, by Charlotte M Yonge
Mother, by Kathleen Norris
For something to raise a smile, you can rely on:
The Devil's Dictionary, by Ambrose Bierce
The Wallet of Kai Lung, by Ernest Bramah
The Importance of Being Earnest, by Oscar Wilde
Three Men in a Boat, by Jerome K. Jerome
Piccadilly Jim, by P. G. Wodehouse
If poetry is your thing, you have lots to choose from:
Shakespeare's Sonnets
Project Gutenberg's Book of English Verse
The Home Book of Verse, edited by Burton Stevenson
The Complete Poems of Henry Wadsworth Longfellow
Leaves of Grass, by Walt Whitman
Now, that's just a handful from our over 10,000 eBooks, so don't tell me you can't find anything to read! If you do have ideas of your own, download GUTINDEX.ALL and browse through the whole list, or Browse by Author on the website at <http://www.gutenberg.net/find>.
Download a few. Read them on your PC, or reformat them and print them out, or convert them for your PDA. Get used to working with and formatting text. Look at the formatting decisions that earlier volunteers have made--they're not entirely consistent; different people make different choices, different books require different methods, and PG conventions have shifted slightly over the last 10 years--but they're all perfectly readable and convertible today.
If you find typos [R.26] in any of them, tell us! That's also a part of being a Gutenberg volunteer. Our eBooks improve with time!
If you're thinking of making the best use of your time looking for errors in posted texts, a good start would be to download 40 or 50 texts, and run a spelling checker and gutcheck [P.1] on them all, spending only 5 or 10 minutes on each. Having had a quick look at all of them, concentrate on the ones that seem to have most problems--where automated checkers see 10 problems, a careful human will usually be able to pick up 20.
Getting Productive
OK, so you've seen what etexts should look like, you know what we do, and proofing hasn't scared you off. It's time to step up and become a producer. If you're not a typist and you don't have a scanner, take a detour down to the Scanning FAQ [S.1] now, and come back when your scanner is set up. If you're a typist or you've already got a scanner, read on . . .
Get a book. Just do it, OK?
Ya gotta start somewhere, right? And finding an eligible book is definitely somewhere.
Finding an eligible book is a threshold for many beginning volunteers--it's the first major step on the way to producing. For a lot of people, it's also the toughest barrier they have to cross. Fortunately, the barrier is only psychological, and can be crossed in a few minutes.
It's an unfamiliar process, and one that a lot of beginners feel some anxiety about. Don't. It's quite straightforward: it's just buying a book--you've done that, haven't you? Don't over-think it, don't worry about whether you're making the "right" choice, don't spend months comparing lists and choosing. Just do it. Once you've got your first, you'll wonder what all the fuss was about. Thanks to the wonders of the internet, your book can be on its way to you in an hour if you have $20 to spend.
Typists blessed with a good local library don't even have to buy their books--they can just borrow one and type it up! (You may be able to scan a library book, but get some experience with scanning first, and avoid damage!)
Let's deal with the decisions and other issues of picking one.
Copyright
For your first book, don't try getting fancy with copyright issues. Choose one that was published before 1923, and you're in the clear for U.S. and PG copyright purposes. You can read the dates just as well as we can--with books printed before 1923, there are no hidden catches: "Pre-'23 is free". Just read the TP&V [V.25] of the book, and see that it was printed before 1923, and you have no problems. Of course, reprints [V.19] of books copyrighted pre-1923 (and various other cases) are also clear, but if you have any concerns, just stick to pre-'23 editions.
Which book?
The answer to this question is different for everyone, but see how much you agree with the following statements:
"I have a favorite book, and I'd really like to produce that."
Well, hey, this is no problem! You already know what you want. Go check out whether the book is already on-line [V.29].
"I'd like to work on an important book, but I don't know which."
Well, everybody's definition of "important" is different, but some people have put their various ideas forward already; you can see whether you agree with them! The InProg List contains some, with the notation "Suggested book to transcribe" beside them. Steve Harris keeps a list of unproduced possibles at Steveharris.net. John Mark Ockerbloom's "Books Requested" page lists titles that people have asked for. [B.4] Your problem if you fall into this category is that other people probably wanted to produce "important" books too, and lots are already done.
"I just want an easy, trouble-free book to start with."
Your first book doesn't have to be War and Peace (we've already got that anyway!). Here's a tip: try looking for children's or what we would nowadays call "Young Adult" books. These are typically short, and may have large print, which makes life much easier if you're scanning. They age well: children's stories from a century or more ago are still readable and interesting to children today. We have many children's and YA eBooks: not just the classics like Grimm and Andersen and Heidi and Oz and Peter Pan and William Tell, but lesser-known but still enchanting stories like The Counterpane Fairy, or Lang's Fairy books. There are series, like the Motor Girls, or the (Country) Twins series, or the Bobbsey Twins. There is lots and lots of material here for you to start with, and these books are relatively plentiful, since they were made to take the kind of treatment children dish out, and many of them have been in school libraries or attics for years.
Whatever your choice, pick a book that you'll like; you'll be living with it up close and personal for a while. Light reading, adventure fiction, and books aimed at younger readers are safe first choices for most people. If you admire 19th Century scientists or scholars, and want to immortalize their work, great! But don't feel that you have to dive in at the deep end just because someone else wants you to.
Getting your book: a practical exercise
The Search
At this point, you've got a list of books--maybe just one, maybe several by an author or two, maybe just a genre like "Children's Books" with some specific ideas. Maybe your mind is still wide-open.
Before used booksellers had the Net, finding a particular old book was a daunting job. Booksellers had informal networks among themselves and exchanged catalogs so that each would know something about what was available elsewhere, but, for a buyer, finding a particular book was still hit-and-miss. Now, however, a number of large sites provide a service to booksellers, where they can list their inventories for people to search from anywhere.
So now we go hunt for them on the Net. No, you don't have to buy them on the Net--you can rummage in booksales and garage sales and used bookstores, and that's its own kind of fun, though on a physical hunt, what you need is to bring a long list of "already done" books with you. But even if you never buy over the Net, it's a vast source of information about what books are available, which are plentiful, and which are cheap. It gives you some experience of what to expect when you do your in-person browsing.
Here's a story of a typical Net-hunt. And you can follow along with it at home. :-) Your results, and the sites you end up at, will be different from mine, but even if you don't end up buying a book on this hunt, you'll get some experience of what's involved. C'mon, do it with me--see if you can find a better bargain!
I'm starting with two lists, and I'll follow up whatever seems promising. I'd like to spend about $20--might go to $30. Definitely not interested in $50 and up. I'm keeping in mind that I'll have to add a bit for delivery--usually up to $10 within the U.S., but can get expensive if you're in Perth, and ordering from a bookstore in Munich.
I'm also avoiding anything that might be tricky to clear on this search, and confining myself to books printed before 1923.
Of course, by the time you read this, some of these books may already have been produced, so if you're actually thinking of buying any, check carefully first!
My first shortlist consists of books that caught my eye from David Price's In-Progress List, Steve Harris's site, and The On-Line Books Requested page [B.4], and it reads:
Louisa May Alcott: The Inheritance
E. W. Hornung: Irralie's Bushranger
E. W. Hornung: Stingaree
A. A. Milne: The Dover Road
A. A. Milne: Once on a Time
Samuel Richardson: Pamela
Oscar Wilde: The Critic as Artist
As well as following along with my list, you should try finding two or three books of your own, from those sites or from your own preferences, and search for them in the same ways that I do.
Everyone has their own searching technique and their own favorite sites to search. For this session, I'm opening up three copies of my browser--one for Alibris <http://www.alibris.com>, one for Abebooks <http://www.abebooks.com>, and one for the Catalog of the Library of Congress <http://catalog.loc.gov>. I'll do my initial searches on Alibris and Abebooks, and keep the LoC site handy for reference.
In Alibris, I head straight for the Advanced Search page, since they allow searching by date, and I immediately put "before 1923" into every search, which avoids having to scan through modern reprints. In Abebooks, I choose "Hardcover" in their advanced search, which is not quite as good a filter, but does at least screen out recent paperback editions.
In each of the sites, I just enter the author's surname and one word from the title of each book, and look at the search results.
Louisa May Alcott's "Inheritance" looks like it's going to be tough. I don't find it in either of my two bookstores. On doing a little checking with modern bookstores, I find it was her first novel, written when she was 17, and as far as I can see, not published during her life: apparently only recently published--the LoC site has nothing prior to 1997. A disappointing start to my search. I understand why it's very desirable to get it online, but this one's going to be very tough to clear, and I'm staying away from it.
E. W. Horning's "Irralee's Bushranger" is also elusive: it doesn't show up at either of my sites, so I check out the LoC to confirm I have the title right, and yes, there it is: "Irralee's Bushranger, a story of Australian adventure, 1896." So I widen my search by visiting <http://www.trussel.com/f_books.htm> and searching many of the sites there. Still no luck. If I were particularly eager to get this book, there are several things I might do at this point: I might register a "want" with one of the sites, asking to be notified when a copy is listed, I might use the OCLC WorldCat search (which Abebooks calls "Find it at a local library") where I can locate libraries that have copies, or I might even contact some individual booksellers and make a request that they look for it. Some booksellers actually specialize in looking for hard-to-find books; but of course I expect I'd have to pay a bit more for it when they do find it, and given my success with the rest of my list, and my price bracket, there seems no need to go that far today.
Horning's "Stingaree", by contrast, seems to be everywhere, in several editions, and cheap. It must have been a bestseller in its day--not surprising, from the author of "Raffles". 1902, 1905, 1909 editions abound. The cheapest are 1910 and 1907 editions for $4.95 and $5.00 from booksellers listed at Abebooks.
Milne's "Dover Road" is available from both sites. There seems to have been a Putnam's printing in 1922 of "Three Plays: The Dover Road. The Truth About Blayds. The Great Broxopp." of which lots of copies survive. There also seem to be later printings which would qualify as reprints if I were desperate, but the 1922 edition is priced from $12.00 to $50.00, so I'll take the 1922 $12.00 copy from Abebooks. As a bonus, I don't see the other two plays listed as being online anywhere, so I'll get three texts (and short ones, too!--279 pages for all three) for the price and effort of one.
Milne's "Once on a Time" is a bit less common, but once again a Putnam's printing of 1922 keeps it in the race. There are a couple of booksellers in England selling for 15 pounds (which just about makes my $20 threshold) and 20 pounds, and an ex-library copy going for $25.
There are lots of eligible copies of "Pamela" available, ranging from a fourth edition at a mere $4,999 (no, thanks!) to a 1921 printing at $6.60 at Alibris. I'll take that one, please.
Wilde's "Critic as Artist" is fairly widely available. A 1905 edition of "Intentions: the Decay of Lying; Pen Pencil and Poison; the Critic as Artist; the Truth of Masks" is available at Alibris for $8.80, (and other copies of the same edition there and on Abebooks in the $20-$30 range) and Abebooks lists a London 1919 edition at $12.50. There are several copies listed in both places as "undated" and "reprints"--I'm avoiding these, since while it's quite likely that they might be clearable, I'm not taking risks on this search.
My second list isn't a list--just a vague category: children's books that are easy to do.
I go to Alibris' Advanced Search, and enter "Child's" in the title, and pre-1923 in the date, and, excluding titles already on-line, immediately get:
A Child's History of France $13.20
A Child's Story of the Bible $5.50
First Lessons in Botany or The Child's Book of Flowers $13.20
The Child's Book of American Biography $11.00
The Child's First Bible $8.80
The Child's Music World $8.80
and so on through quite a list.
OK. That's a good start. But my choice so far is unimaginative. I need better search terms. So I go to main search engines with the terms "children's antiquarian books" and find a half-dozen or so sites that specialize in them. I can browse around there, though it's slower going without searches to focus my results. I find <http://www.bookrescue.com>, specializing in children's books. Wading through the miles and miles of Alcotts and Barries and Burnetts, which are mostly already online, I think, I find a couple of authors from them who must have been popular, because they seem to have published lots of books before 1923: Angela Brazil and Dorothy Canfield. (I only got as far as the "C"s!)
I could of course stop here and buy some, but today I want to see what else is out there.
Back at Alibris and Abebooks, armed with my authors to search by, I turn up 4 pre-1923 books under $20 for Angela Brazil:
A Terrible Tomboy
The Youngest Girl in the Fifth
A Fourth Form Friendship
A Pair of Schoolgirls
and several between $20 and $30.
Dorothy Canfield immediately yields multiple copies of:
The Brimming Cup
Home Fires in France
Hillsboro People
Understood Betsy
Rough Hewn
The Real Motive
and others, and I haven't even got to $20 yet, nor to the letter "D".
A browse through the Ebay Collectible and Antiquarian Books section also throws up a respectable list of eligibles. I won't even bother counting that.
In 20 minutes, I have found five of the seven on my search list. In less than hour after that, I found over 16 eligible children's books, all under or around $20 and all available online.
Before committing to one, though, I would double-check that the book hasn't been transcribed online, and isn't In Progress.
Double-checking your selection
If you're concerned that the book you have chosen duplicates another that might be in progress, and want to double-check, you can e-mail the Posting Team asking them to check whether any recent clearances have come in for that title.
Duplications do happen--there's no way of avoiding them when different people are making independent decisions--but they are rare.
Dealing with used booksellers
As a class, used booksellers are very pleasant people--remarkably friendly, knowledgeable and helpful, even to people buying on a typical Gutenberger's budget.
Some of them are not, however, models of ideal data organization when it comes to Internet listings. There are lots of one- or two-person operations dealing with an inventory of many thousands of books, and having located your book online, you should check that it's still available.
You can place an order through the site and wait for the confirmation, or you can simply call the bookseller. Not all booksellers' contact details are listed, so it's not always an option, but when you do phone you're likely to be speaking immediately to someone who can tell you for sure whether the book is still there, can pull the book off the shelf and answer questions about it, and can take your credit card details on the spot and dispatch the book immediately.
Copyright Clearance
As soon as your book arrives, send us the information needed for Copyright Clearance first. Even if your book is a true-blue, no-questions-asked pre-1923 edition, we should know about it as soon as possible so that it can go onto the In-Progress list for others to see that someone has started on it.
Wait for the confirmation e-mail before starting any serious work. Some people have thought that "Copyright 1923" plus some wishful thinking would be good enough, and, unfortunately, it isn't. Some people have gone ahead and produced the whole book before sending in the clearance, only to be disappointed, all their work wasted.
Books published in 1922 or earlier are clearable, but some people, ever optimists, overlook that little "1927" in small print on the verso. Sometimes there is no copyright date on the front, and other optimists assume that these books are OK. They may be; they may not be. Don't get caught in the copyright trap.
As soon as you have what you think might be an eligible book, do not start on it. Do not ask another volunteer's opinion. Just send in the TP&V and wait for the confirmation e-mail to find out for sure.
Even when your TP&V clearly says "Copyright 1901", send it in. We need to get it into the clearance files so that we can register it as being In-Progress.
Producing
If you're a typist, there's not much more you need to know from this point: you can just get on with the job, with maybe a few tips from the FAQ. In fact, if you're a typist, you might wonder why the rest of us make such a fuss about scanners, and settings, and OCR. Take pity on us! we just can't produce the way you can. Smile indulgently, ignore all the scanner jargon, and submit your completed text while we're still saying bad words about the guttering on a greyscale image of page 372. :-)
If you are using a scanner to copy a book for the first time, be patient with yourself. Some people start off with too high expectations of what they can achieve. Believe it or not, scanning does work effectively; it just doesn't work perfectly. And often, you need a little practice before your scans work right with your OCR. The Scanning FAQ [S.1] has lots of specific tips you can try. Start by scanning a double-page about a third of the way through the book. Scan in Black and White and in Greyscale, at 300dpi and 400dpi. Try 600 dpi if it seems like a good idea. Put it through your OCR and see what comes out. Move your scanner so that you can be comfortable while placing the book and turning pages. Allow yourself an hour to experiment with different settings, and different pages. Put the sample images included with the Scanning FAQ through your OCR and see how the output compares to the text produced by other packages. That first hour finding out about how your setup works will be the most valuable hour of scanning you will ever do.
Having figured out what settings you want to use for this book, make sure you implement the best speed you can. Usually this means telling the scanner to scan only as much area as the book covers. This is quite important, since the scanner will by default scan its whole area, and you don't need all that; it just wastes time and makes your images bigger.
You may also be able to set your OCR or scanner software to auto-scan pages with some preset delay, like 5 seconds. This also speeds things up, because the scanner isn't waiting for you to hit the keyboard, and you have both hands free at all times to turn the page and replace the book. It takes a few pages to get into the rhythm; if you miss a page-turn, don't worry--you can get it on the next scan.
Using a reasonably modern but quite ordinary home/office type flatbed scanner, you should be able to scan 200 pages an hour [S.9] of a typical book, at good quality. 400 pages an hour is not unheard-of. Now, it may fairly be said that scanning offers all the fun of ironing, without the sense of adventure :-), but if you have got your settings right, you will probably be able to do the whole job in less than two hours. And now you're really on the road!