The human genetic instruction book just got more readable. Nearly a decade after the Human Genome Project assembled the genome’s 3 billion chemical units, an international consortium has revealed how the components fit together into sentences and chapters.
Already, the genome’s tales are revealing how genetic variants contribute to disease, giving researchers insights into human evolution and even changing how scientists define a gene.
“The questions we can now ask are more sophisticated and will yield better answers than the ones we were asking nine years ago,” says Eric Green, director of the National Human Genome Research Institute, which coordinated and funded the mammoth Encyclopedia of DNA Elements, or ENCODE, project.
Results from ENCODE, which involves more than 400 researchers around the globe, appear in the Sept. 6 Nature, with more than 30 companion papers published in Science, Genome Research, Genome Biology, Cell and BMC Genetics.
When scientists announced the completion of the Human Genome Project in 2003, researchers could pick out genes that carry instructions for building proteins. But that information comprises less than 2 percent of the genome. Some people passed the rest of the genome off as “junk DNA.”
For the new project, ENCODE collaborators cataloged 50,000 human genes. That total includes the roughly 21,000 protein-making stretches of DNA that are traditionally thought of as genes, as well as nearly 30,000 that are copied to produce the information-containing molecule RNA. The analysis reveals that at least 80 percent of the genome serves some purpose. “Perhaps none of it is truly junk,” says Ross Hardison, a biochemist and molecular biologist at Penn State University in University Park.
Within that 80 percent is a complex network of regulatory switches that control how cells interpret the genetic instructions contained in DNA.
The team carefully mapped out more than 4 million short stretches of DNA (usually about six to 10 DNA units, or bases, long) in the genome where proteins called transcription factors latch on, nudging genes’ activity up or down. Changes in gene activity help determine how an organism grows and play a role in both health and disease. The scientists also noted places where DNA or its associated proteins are tagged with certain chemical marks that can change the way DNA is packaged, which alters gene activity and influences how an organism develops and functions.
Most of the genome appears to be engaged in regulating gene activity, with multiple transcription factors and other regulatory proteins teaming up to control the action of each gene, says John Stamatoyannopolous, a genomics researcher at the University of Washington in Seattle. His team describes complex gene regulatory networks formed by 475 transcription factors in the Sept. 14 Cell.
Genetic variants linked to diseases tend to fall within these regulatory regions of the genome, Stamatoyannopolous and his colleagues discovered. For years, scientists have combed the genome looking for common genetic variants that contribute to disease. Many researchers were frustrated by such studies because most variants associated with disease don’t lie within genes and therefore don’t have an obvious effect on protein production.
But in a study published in the Sept. 7 Science, researchers found that genetic variants that influence a person’s risk for disease or help determine physical characteristics such as height are located in regulatory switches. Those altered switches can affect activity of genes located far from the variant. And instead of just a few variants being involved in a disease or physical trait, the ENCODE project indicates that dozens to hundreds of variants, each with a subtle effect on gene activity, may play a role. That may help scientists account for a greater proportion of the genetic contribution to common diseases.
Many related diseases may share disturbances in certain regulatory networks, researchers discovered. For instance, about 24 percent of genetic variants involved in autoimmune disorders — such as Crohn’s disease, lupus, rheumatoid arthritis and type 1 diabetes — affect switches flipped by transcription factors that interact with interferon regulatory factor 9, a transcription factor that stimulates production of an immune chemical that helps control inflammation.
Besides studying the complex plot of the genetics behind disease, researchers are using ENCODE as a resource for studying human evolution. Manolis Kellis and Lucas Ward of MIT used the data to unveil parts of the human genome that are changing more slowly than others. The researchers report online September 5 in Science that natural selection seems to weed out changes in about 4 percent of the genome, perhaps indicating that those parts of the genome are important for human evolution. Some of those important parts contain regulatory switches that affect development of eye cells needed for color vision or that help control growth of nerves.
Probably more of the genome has a function than not, but Ewan Birney, associate director of the European Molecular Biology Laboratory-European Bioinformatics Institute and one of the leaders of ENCODE, says there may be a tiny bit of junk in the genome. “I find it hard to believe that everything is really critical and important.” But he and his many colleagues will focus on the vast majority of the genome that does something. Figuring out what that is will be the fun part.