Genome 10K: A new ark
By Janet Raloff
Biologists can tell a lot about how living things evolved by rooting around in their genes, comparing snippets of DNA from supposedly related — or unrelated — species. This only works, of course, if catalogs of those DNA snippets exist, which they largely don’t yet. But such catalogs could exist in the not-too-distant future. That is, of course, if a consortium of researchers gets its way — and a boatload of money.
Earlier this year, a group of scientists launched the Genome 10K Project. Its aim: to collect tissues or cells from at least 10,000 vertebrate species — enough to catalog DNA sequences from about every vertebrate genus. The project has gained a lot of momentum and the support of researchers at more than 40 zoos, museums, universities and other research centers. Dozens of these scientists have now lent their names as authors to a new Journal of Heredity paper, posted online November 6, describing what they hope to learn.
And that’s simple, says geneticist David Haussler of the University of California, Santa Cruz: “We want to see evolution in action.”
Really? By looking at the DNA from a single male and female of each of thousands of different critters?
Many species host identical long stretches of DNA. What’s interesting is where they differ. And sometimes a huge difference in the appearance of animals — say humans and chimps — may trace to less than a 2 percent difference in their genes, Haussler says. To find out where particular traits emerged and when, geneticists can look for the time in the distant past when the gene for one or more traits mutated.
By knowing when animals diverged in the archeological record and what traits are associated with that split, Haussler says, scientists can now tie those new traits to particular genes. From that, they can essentially date when these novel features emerged, and how broadly they’ve spread among seemingly related species. “That’s witnessing evolution in action,” he explains.
Despite its 10K name, the project actually seeks to map the DNA of 16,203 species. It’s an unusual and explicit figure. It’s also the number of species for which tissues or cellular samples already exist in storage somewhere — perhaps at a zoo or in some museum freezer. Yes freezer. None of these species need be killed for the first round of genomic analyses — because “they’ve already gone to their maker,” observes geneticist Stephen O’Brien. The chief of the National Cancer Institute’s genomic-diversity laboratory, he’s able to contribute quite a few such samples from the dearly departed.
“I spent a lot of my career collecting specimens from nontraditional species of mammals that I thought could be studied for the mining of resistance to disease and other effects.” What kind of species? Lions, cheetahs, orangutans — even humpback whales.
Probing their genetic inheritance, he says, might explain why sharks don’t get cancer or monkeys can be resistant to AIDS. It may even point to why some animals naturally experience low rates of heart disease while consuming high-fat diets.
But with current DNA-sequencing costing $50,000 to $100,000 per test sample, the Genome 10K Project would break the research community’s bank. So its designers have decided to wait for sequencing costs to drop by a factor of 10 or more — probably in the next couple years — before launching their analytical program in earnest. By that point, the whole project might be accomplished for something around $50 million, Haussler suspects.
No question, adds O’Brien, “We’re looking for big money.” But he’s also confident benefactors will eventually step forward, willing to “put their name on this Book of Life.”
Currently, however, in terms of getting this program firmly off the ground, “the rate-limiting step isn’t going to be getting the money or doing the sequencing — or even analyzing the data,” O’Brien contends. “It’s going to be getting the specimens collected properly and legally transported to the centers that can do the DNA sequencing.”
He’s referring to the need for confirming sample provenance and quality.
As regards the first, the Genome 10K Project has committed itself to establishing where each sample came from. And that if it belongs to an endangered or protected species, proper permits were obtained before the sample was collected.
Oliver Ryder is director of genetics at the San Diego Zoo’s Institute for Conservation Research. It’s home to a Frozen Zoo — tissue samples from some 8,600 individual animals representing roughly 800 species. All samples from the Frozen Zoo collection, acquired from 180 institutions around the world, have already been vetted for proper provenance. But the same may not be true for ancient samples in museums or elsewhere.
The second potential limit to species sequencing has to do with ensuring that biologists don’t waste precious resources trying to analyze the DNA from a sample that was stored improperly so that its genetic material has degraded.
A museum may offer up for testing some sample that spent the last century pickled in alcohol. “We may have to do some experimentation to verify the quality of that sample’s DNA,” Ryder says. To make mass sequencing of the project’s samples cost effective, they’ll need to be done in an assembly-line fashion, he says. So most will need to be ready for sequencing at about the same time.
Of course, just getting a DNA blueprint for a species isn’t the goal of Genome 10K, Haussler says, because it won’t tell you what that DNA has evolved to do.
DNA contains the code that tells the body what proteins to make. Those proteins will ultimately create tissue, produce signals that orchestrate the timing of processes within cells, permit an individual to reproduce — even underpin behavior. In a sense, then, it’s the proteins that geneticists really care about.
It helps to think of DNA as akin to ingredients in a pantry. Some DNA may never be unshelved for use in creating an entrée — some biologically active protein. To understand which entrées DNA will create, Haussler says, geneticists need to also understand the RNA associated with a genome. RNA is a form of nuclear material that essentially copies active segments of DNA and then converts them into ingredients for the recipes used to make proteins in the body.
Once the Genome 10K Project begins transcribing DNA blueprints for each vertebrate genus on Earth, biologists risk becoming buried by an avalanche of data. We’re talking about petabytes — roughly a billion gigabytes — of information that must be collected, analyzed and stored, Haussler says. That too can be costly. But he looks forward to having that problem — more comparative genetic data than he knows what to do with.