Biological Moon Shot

Realizing the dream of a Web page for every living thing

Richard Pyle hasn’t gotten a congratulatory crate of free diapers. But he’s one of the fathers, in a sense, of the first fish species named in 2008. Quintuplet species even. The journal Zootaxa posted descriptions of five damselfish on Jan. 1 that Pyle and his colleagues at the Bishop Museum in Honolulu found using a specialized mix of gases to push beyond the depth limits of conventional SCUBA gear.

An illustration of the night-blooming cereus Selenicereus grandiflora stands out in the 1807 book New Illustration of the Sexual System of Carolus von Linnaeus. It’s one of hundreds of old texts going online at www.biodiversityheritage.org that will eventually become part of the new Web-based Encyclopedia of Life. Biodiversity Heritage Library
NEW BLUE. The deep blue chromis, first described in January, will be among the 30,000 or so fish in the Encyclopedia of Life’s first entries. T. Clark
FIRST FLOWERS. The encyclopedia’s first plants will come from a database of the Solanaceae family, which includes potatoes, tobacco, and this Bolivian Solanum whalenii. S. Knapp, Natural History Museum of London

In a few weeks, the rest of us will be able to catch up on such frontiers of exploration a lot more easily. A sweeping informatics project called the Encyclopedia of Life is scheduled to launch its first trial entries on the Web in late February (SN: 5/12/07, p. 294). According to the plan, the encyclopedia portal will provide access to roughly 30,000 Web pages of specialists’ data—one page for each of the known species of fish.

And that’s just a baby step. Unveiled in May 2007, the Encyclopedia of Life project envisions such powerful tools for managing and centralizing biological information that a decade from now anyone visiting www.eol.org should find the Mother Nature of all encyclopedias: easy access to a Web page with definitive, current information on each species on Earth.

No one can say how many Web pages that total coverage will need. The encyclopedia’s godfather, biologist E.O. Wilson of Harvard University, speaks of 10 million. “It should be thought of as a biological moon shot,” he says.

He and his fellow encyclopedists argue that if they realize their ambitious dream, they’ll change the science of biology. They propose that the new informatics methods and centralized Web portal will speed up the old, underfunded business of figuring out what’s what (or who’s who) among living things. And the speedier tools will drive novel inquiries, including an expansion of the study of networks, such as food webs, and the search for evolutionary patterns.

Planners also hope it’s not just for science. Using the new tools to climb the tree of life should be fun—for scientists as well as for poets and plumbers and kids. If all goes according to plan, the Encyclopedia of Life will be cool.

Roots

Like flying to the moon, making one encyclopedia of all life is an old idea that technology might finally make possible.

The urge to produce an overarching view of living things goes back at least to Aristotle. Even the idea to make that long list in Latin with two names for each species goes back more than 250 years, to Carl Linnaeus’ foundations for biological nomenclature. Hence, Wilson wrote in an early proposal for the encyclopedia, people “assume taxonomy all but wound down generations ago.”

Not true. So far scientists have given formal names to only about 1.8 million species. Published estimates for the actual number of species on Earth range from 3.6 million to upwards of 100 million—numbers based on extrapolations and a fair bit of outright guesswork. In many ways, taxonomy has barely begun.

And while scientists have identified many of the largest and most obvious species, they very likely haven’t found the most important. Marine biologists didn’t describe the bacterial genus Prochlorococcus until 1988. Yet these picoplankton, barely visible with optical microscopy, floating in aquatic clouds and capturing the energy of sunlight through photosynthesis, account for a significant proportion of marine productivity.

Likewise of the extraordinarily numerous hairlike roundworms called nematodes that wriggle through soil or colonize plants and animals, only a small percentage have names.

Even with the biology you can see, scientists are playing catch-up. According to Wilson, the number of known frogs and other amphibian species has jumped from 4,000 to 5,400 over the past 15 years. New plant species continue to join the roster at a rate of about 2,000 a year.

These measures reflect only the first step of naming an organism. How it lives, what it eats or gets eaten by, and whether people might find it useful or dangerous or charismatic often remain unknown. Yet the growing human population redirects the fates of these species, pushing some toward new habitats and others toward extinction.

“We’re sailing blind into our environmental future,” Wilson told attendees at the 2007 TED conference, a gathering of luminaries in technology, entertainment, and design. Wilson’s pitch marked the opening night of the current effort to upgrade biological information tools.

After several years of behind-the-scenes campaigning, Wilson and other planners had secured seed money for the project: $10 million from the John D. and Catherine T. MacArthur Foundation and $2.5 million from the Alfred P. Sloan Foundation. A consortium of museums and other science institutions is organizing to get the job done.

Fish first

At one of those institutions, the Smithsonian in Washington, D.C., the encyclopedia executive director, James Edwards, is in charge of seeing that this moon shot doesn’t fizzle.

Sample encyclopedia Web pages show flashy images and videos plus links to the latest genetic sequences and a scan of the page of the book in which the first published description of a species appeared. Cool, yes, but time-consuming. Developing entries of that quality for millions of species will take years, and Edwards doesn’t want the world to lose interest in the meantime.

So, the encyclopedia will release something fast, but just a small something: a portal to basic info on fish. The creators will present the pages as a work in progress, soliciting user comments.

Visitors will be able to admire a portrait of the zebra turkeyfish and a map of its range in the Pacific, for example, or learn that the white-spotted boxfish typically frequents tropical waters 1 meter to 30 m deep. The modern Latin names will be paired with tables of common names in dozens of languages.

The fish information itself won’t be an encyclopedia creation. Instead, the informatics specialists are building a new portal to an existing site, called FishBase. This strategy illustrates how such a grand undertaking as the compendium of all living things might just be possible. The project won’t start from scratch with 10,000 taxonomists typing until they create an encyclopedia. Specialists have already made databases with reliable information, and the encyclopedia will provide a central entryway for using these trusted sources.

“Everybody wants his or her favorite organism there first,” says Edwards. “If you’re a leech lover, you want leeches. If you’re a spider lover, you want spiders.” What the encyclopedia crew is actually going to present next, with or just after the fish, are plants in the Solanaceae family—including tomatoes, peppers, petunias, tobaccos, and potatoes. “It’s timely, because 2008 is the International Year of the Potato,” says Edwards. (Not a joke. See “It’s Spud Time”.)

As the Encyclopedia of Life grows, its tools will capture the latest research to enrich those sources. Google-like aggregation technology will register new publications or gene sequences, for example, that appear on the Web.

“The most exciting thing about this project to me is that we have a blizzard of information coming at us all the time—and it’s not just in science, it’s everywhere,” says Mark Westneat of the encyclopedia group based at the Field Museum of Natural History in Chicago. Financiers monitoring markets and even travelers wondering whether to pack boots have some fine systems for sifting out the desired snowflakes from all the rest of the information. “Biologists are a little bit behind in informatics tools,” he says.

The fish segment illustrates another feature of the encyclopedia plan: the quality of sources. Westneat, who studies reef fishes, encountered FishBase in its larval stage at a biologists’ gathering in the Philippines in 1995. One of its originators, fish biologist Rainer Froese, brought an early version of this database and appealed to his colleagues to groom glitches out of it and supply photographs. “We grudgingly did so,” says Westneat. “We thought, ‘Oh, this will be nice for school kids and stuff, but I’ll never use it.'” Then heroic efforts by William Eschmeyer of the California Academy of Sciences in San Francisco standardized the taxonomy with up-to-date forms and lists of synonymous names. “All of a sudden, FishBase became this incredibly valuable resource,” Westneat says. “I use it every day.”

Such trustworthy information isn’t just swimming free in the seas. “A significant challenge facing the Encyclopedia of Life is engaging the scientific community to provide content,” says botanist Richard Ree of the Field Museum. “Similar initiatives have been tried in the past, and I think it’s safe to say that none met with resounding success.”

Ree does add that the project has advantages over previous proposals. The star power of E.O. Wilson and the TED conference attendees could catalyze interest from the corporate sector and allow access to its considerable experience in developing tools for managing computer information.

The encyclopedia planners are well aware of the need for active support from scientists, says Westneat. He leads a team focusing on how to make the encyclopedia so useful that scientists will decide that providing top-quality information is worth their time. “The scientific community is going to make the Encyclopedia of Life rich, and it’s going to make it correct,” he says. In turn, that gold standard information should enrich the specialists’ pursuits.

If only

As an example of such a pursuit, Westneat describes the travails of Jennifer Fessler, one of his students, who has just finished revising the taxonomy of the gorgeous but confusing butterfly fish.

She discussed fish distribution, which meant refining maps of ranges for some 50 species. The Global Biodiversity Information Facility database let her download information on museum specimens worldwide to find collection spots for the coral reef fish. That resource certainly helped, but so far there isn’t a good automated way to check for typos in the latitude and longitude. Mappers like Fessler must slog through data looking for anomalies. “There’ll be this record of a coral reef fish in the middle of the Midwest,” Westneat says. Between proofing the locations and putting data into the right format, the work took Fessler months. “What if you could do that in a couple of minutes?” Westneat daydreams.

Parts of the job of revising or creating species names could get faster, but overall “it’s not something that can be done at the speed of light,” says Corrie Moreau of the University of California, Berkeley.

For example, Moreau is now considering whether small Pheidole hyatti ants, with their distinctive, large-headed soldiers, represent just one species or several. Yellowish individuals show up on desert floors, but a darker form dominates higher and shadier habitats. To sort out the problem, she and her collaborator, Stefan Cover of Harvard’s Museum of Comparative Zoology in Cambridge, study ants in the wild but also need lots of other resources. The project requires reviewing literature on the species and its relatives dating back at least 100 years, examining museum specimens and collecting new ones, and sequencing stretches of DNA.

Even though she expects systematics will always demand time, Moreau says she would welcome any streamlining that the Encyclopedia of Life could offer. She could untangle her ant puzzle faster if she had a central source for reviewing early descriptions, high-detail portraits of specimens, and new DNA work.

Her wish about the old publications is already, albeit slowly, in the process of coming true. Thomas Garnett of the Smithsonian’s National Museum of Natural History heads a scanning and digitization group of encyclopedia workers. They are cooperating with the Biodiversity Heritage Library, a project through which 10 major libraries are scanning and placing on the Web pages from volumes that describe species. Some 80 million pages come from publications old enough to be in the public domain, and the scanners are starting with those.

“The scanner is the person; the machine is the Scribe,” explains Martin Kalfatovic, Garnett’s colleague and a digitization expert, as the two tour the disappointingly not-heaped-with-cobwebby-dino skulls, well-lit basement of the museum. There, in a large, mostly empty room is the scanner—a real person who sounds pretty sane for someone who turns 3,000 pages a day.

The scanner sits in front of the Scribe machine, a highly evolved computer desk with paired cameras and links to massive bibliographic databases, all inside a booth covered by black canvas. He deftly settles a thin entomology volume with sallow pages into a V-shaped cradle that keeps the book’s elderly spine from having to strain all the way open. A foot pedal sends a hovering glass cover down just so to flatten the half-open book pages. With a synchronized jachick, a pair of cameras shoots the two visible pages. Capturing the image, it turns out, is just the beginning. Software allows images to be corrected for off-kilter angles and other flaws and to be tied to catalog information in the databases.

As of Jan. 25, the project has scanned 3,661,118 pages, Garnett says. The project’s Web site (www.biodiversitylibrary.org) opens virtual access to a number of rare-book-room treasures: a 1484 guide to medicinal plants from Mainz, Germany, and Robert Thornton’s 1807 New Illustration of the Sexual System of Carolus van Linnaeus with full-page glamour portraits of flowers against moonlit rivers or other dramatic backgrounds.

Garnett points out that the century-old volumes of Biologia Centrali-Americana have also gone online. Both botanists and zoologists need this basic work when tracing the history of species descriptions. Yet, he says, “there are only two copies in Central America.”

In talking about the vital business of opening library resources to far-flung scientists, Garnett rolls his eyes at the mention of a specialized source for historians of science that has become one of the library’s most popular downloads—the 1904 treatise Ants and Some Other Insects: An Inquiry Into the Psychic Powers of These Animals.

Cruising

The broad appeal of psychic ants raises the point that this isn’t just about scientists. “The other audience we’re targeting is middle schoolers,” says Westneat. “They’re very quick. They’re interested. They’re also capable of handling complex ideas.” Plus, they’re agile surfers.

Again he draws on the experiences of FishBase. Useful as it is to ichthyologists, they account for only a small percentage of the visitors. Aquarium hobbyists, fishing enthusiasts, and just plain curious browsers click into the site from all over the world.

When the Encyclopedia of Life matures, Westneat says, he hopes that it, too, attracts what he calls “What’s in my backyard?” questions. Designers are working on ways that someone might see an orange butterfly in Chicago in June and then get the encyclopedia to display a gallery of photos of the likely species.

But that example barely touches the power of the Web. “Imagine all 2 million known species in this grand family tree of life,” says Westneat. “What if you could have that tree of life floating in space on your computer screen and zoom in on the birds and see a blackbird and a hummingbird and a hawk popping up on the branches, the way the restaurants pop up in Google Earth when you zoom in on Chicago? Just imagine the fun that middle school kids will have cruising around the tree of life and finding the narwhal and all the cool animals.”

Imagine the fun any of us would have. The curious might come upon the page for the deep blue chromis (Chromis abyssus) named by the Honolulu team. The damselfish and three of its recently discovered kin swim at depths of at least 85 m in a poorly understood habitat sometimes referred to as the coral-reef twilight zone. So C. abyssus has deep-blue spots as well as a deep habitat for a damselfish. It’s a gentle example of taxonomy humor, yet another frontier for Web surfers to explore.


Susan Milius is the life sciences writer, covering organismal biology and evolution, and has a special passion for plants, fungi and invertebrates. She studied biology and English literature.