By Marie Lebert
English version published by Project Gutenberg, 21 June 2004. Original version published in French by Edition Actu n° 90, Paris, France.
When Michael Hart was a student at the University of Illinois (USA), in July 1971, he set up Project Gutenberg with the goal of making available for free, and electronically, the largest possible number of books whose copyright had expired.
This ground-breaking project became both the first Internet information site and the world’s first digitized library. Michael himself typed in the first hundred books. When the Internet became widely-used, in the mid-1990s, the project got a boost and an international dimension. Michael still typed and scanned in books, but now coordinated the work of dozens and then hundreds of volunteers in many countries.
The number of electronic books rose from 1,000 (in August 1997) to 2,000 (in May 1999), 3,000 (in December 2000) and 4,000 (in October 2001). Project Gutenberg had 5,000 books online in April 2002 and topped 10,000 in October 2003, when it had a team of 1,000 volunteers around the world making 350 new books available every month. These 10,000 books are also available on DVD for US$1 each. Michael hopes to have a million available by 2015.
The books are digitized in "text" format, with caps for terms in italic, bold or underlined, so they can be read easily by any machine, operating system or software. Digitization is done by scanning. The book is then proofread twice by two different people, who make any corrections necessary. When the original is in poor condition, as with very old books, it is typed in manually, word by word.
Digitization in text format means a book can be copied, indexed, searched, analyzed and compared with other books. It also makes a smaller and more easily sendable computer file, unlike with scanning each page, which produces a bulky "photo" file.
Hart describes himself as a workaholic who is devoting his entire life to the project, which he sees as the start of a new Industrial Revolution. He considers himself as a pragmatic and farsighted altruist. For years he was regarded as a nut but now he is respected. He wants to change the world through freely-available e-books that can be used and copied endlessly. Reading and culture for everyone at minimal cost, on a computer or a secondhand PDA costing just a few dollars, or even on a solar-powered PDA, which are starting to appear.
In early 2004, after a stay on the US west coast, in San Francisco and Berkeley, Hart went off to Europe, first Brussels and then Paris. He gave his first lecture in France on 12 February at UNESCO headquarters in Paris, organised with APRIL (Association pour la promotion et la recherche en informatique libre / Association for Promotion and Research in Free Computing) and AFUL (Association francophone des utilisateurs de Linux et des logiciels libres / French-speaking Linux and Free Software Users’ Association). He chaired a discussion at the French National Assembly on 13 February at the invitation of the discussion group “Produire et gérer les savoirs” (Producing and Managing Knowledge), a branch of the “Les temps nouveaux” (New Times) group.
What about books in French? The first digitized books were mostly in English but now there are works in 25 different languages. Of the 11,340 e-books available as of 13 February 2004, 181 were in French. The launch of Project Gutenberg Europe in the next few weeks should see the number grow considerably, and so much the better.
There is much work to be done putting all the classics of French culture online freely available to all in a easy and practical format. A total of 1,117 books are currently accessible in text format on Gallica (Bibliothèque nationale de France / French National Library), 288 on ABU (Association des bibliophiles universels / The Universal Association of Booklovers), 195 in html and/or rtf format on Athena, and several dozen more on other websites. Some digital libraries specialize in shorter material. These include the Bibliothèque électronique de Lisieux (Lisieux Electronic Library), which digitizes mostly news and articles, or Miscellanées, which calls itself a “miscellaneous” library.
By Marie Lebert
Since my 15 February article about Michael Hart and Project Gutenberg, which mentioned the forthcoming launch of Project Gutenberg Europe (Hart recently spoke about it to the European Parliament), I’ve had a lot of questions from readers. Here are some answers:
Remember Project Gutenberg is becoming international. Its main office is in the United States, but Project Gutenberg Australia and Projekt Gutenberg-DE (Germany) have been going for a long time. Project Gutenberg Europe will be European, with a staff in Belgrade and links between the different projects. I think it’s interesting to build a French-language online library working with other groups. It’s preparing for the future, when machine translation will be 99% satisfactory (things are progressing well on that front, though there’s a lot still to do). In about 10 years, everyone will be able to call up literary classics in a choice of about 100 languages. Let’s work together instead of separately, since for once it’s possible.
Let’s also remember that everyone working with Project Gutenberg is a volunteer, including founder Michael Hart. The goal is to ensure its future independence of loans and other funding and of fleeting political and cultural priorities, to avoid any pressure from politicians or economic interests. The aim is also to ensure respect for the volunteers, who can be confident their work will be used for many years, even generations. Donations are used only to buy equipment and supplies, mostly computers and scanners.
And then let’s remember that all the books scanned in are proofread twice, by two different people, to make sure they are 99.9% accurate. Software on the website (which is still being tested) allows users to convert books in ASCII, ISO-8859, Unicode and Big-5, for example, into other formats. Conversion will eventually be possible into still more formats, including Braille and voice. So there’s no point arguing about which format is best. Text format can either be used as is or to create others. Text-format books can also be easily used by those who want to offer them in more sophisticated formats, without any restriction except for respect for copyright laws in the country involved and the availability of new free versions produced.
Some readers have asked about how volunteer proofreaders work. You go to the Distributed Proofreaders Europe website that has just been put up (and is still being tested) by Project Rastko (Belgrade) to handle the shared proofreading done by Project Gutenberg Europe. Sign up and you’ll then see detailed instructions (which are still being translated in several languages). For example, passages in bold, italic or underlined, like footnotes, are always treated the same way, to standardize presentation of all the e-books. A discussion forum allows you to ask questions or seek help at any time.
Each time you go to the website, you choose the book you want. Pages of the book appear side by side in two forms – one the scanned image and the other the text produced by OCR (optical character recognition) software. You compare the two and make corrections. OCR is usually 99% accurate, which makes for about 10 corrections a page. You save each page you do and can then either stop work or do another. All the books are proofread twice (the second time only by experienced proofreaders) before the final version is ready for the public (after which any further errors noted by readers are systematically corrected).
You don’t have any quota to fulfill, but it’s recommended you do a page a day if possible. It doesn’t seem much but with hundreds of volunteers it really adds up. In 2003, on the original site of Distributed Proofreaders, about 250-300 people were working each day, producing a daily total of 2,500-3,000 pages, the equivalent of two pages a minute.
Volunteers can also work independently, by digitizing a whole book in any word-processing programme or else scan it in and convert it into text using OCR software and then make corrections by comparing it with the original. In each case, someone else will proofread it.
These two articles appeared in French ("Michael Hart, ou la volonté de changer le monde par le biais de l'ebook" & "Project Gutenberg: quelques réponses à vos questions") in Edition Actu nos. 90 and 91, of 15 February and 1 March 2004. Edition Actu is the electronic newsletter of CyLibris (distributed free every fortnight) which aims to look at publishing from a different angle. CyLibris, founded in Paris in August 1996 and a pioneer of online publishing, was the first French publisher to use the Internet and digitization to bring out literary works.
Marie Lebert is an independent researcher, writer and journalist. She also works as a translator and editor. She is keenly interested in how new technology is changing publishing, the media and languages. She also campaigns for the free dissemination of knowledge, as far as possible, and developing new publishing structures to break free of old ones and fully use the potential of the Internet. She lives in Paris but would like to find a job in San Francisco. Contact: marie.lebert AT laposte.net
Copyright © 2004 Marie Lebert - All Rights Reserved.
Home page of Project Gutenberg