Life’s code in soap

Biological languages follow the laws of thermodynamics

June 11, 2008 at 10:26 am

By mixing soapy water, oil and the theory of information, a physicist has found a possible clue to the origin of the genetic code, as well as to the structure of other biochemical languages.

Life’s workhorse molecules are made from only 20 different types of amino acids, encoded in the chemical makeup of DNA. In principle, DNA could code for about three times that many, 64 possible combinations. Comparing the genetic code with the physics of soapy water suggests an explanation for why nature chose 20 as an optimal number, Tsvi Tlusty of the Weizmann Institute of Science in Rehovot, Israel, reports in an upcoming Proceedings of the National Academy of Sciences.

Genes are segments of DNA that encode instructions for constructing the molecules, primarily proteins, needed to build and operate cells. Each gene is a long sequence of “letters” — A, C, T and G — symbols for the chemical bases adenine, cytosine, thymine, and guanine. Each three-letter combination specifies an amino acid. But the code is redundant, meaning that sometimes different triplets represent the same amino acid — for example, CAA and CAG both represent glutamine.

The genetic code presumably evolved from the diverse and chaotic chemistry of the Earth’s primordial broth. Before settling on the 20 standard amino acids, the developing code faced opposing pressures. Organisms with a more complex molecular language — using more than 20 types of amino acids — could have deployed a wider range of chemical combinations to adapt to environmental changes. But organisms with simpler chemistry required less molecular machinery and energy, Tlusty explains. And using fewer amino acids reduces the rate of random errors in copying genetic information: If several triplets have the same meaning, there’s a good chance that changing one letter will have no consequences.

Eventually the code reached an optimal level of richness, which provided flexibility without being too high-maintenance. Such a balancing act, Tlusty says, is similar to how certain physical systems tend to make arrangements that minimize energy while maximizing entropy (a measure of disorder).

To put the analogy on more solid footing, Tlusty made a physics-inspired mathematical model of the genetic code. He first represented the code as a network in which each node stands for a three-letter word. Two nodes are connected if they differ by just one letter. Tlusty then “colored” the nodes, assigning the same color to triplets that encode the same amino acid. The coloring partitioned the network into regions.

In the early days of evolution, the boundaries of these regions would have shifted around before finding an optimal configuration. The competitive advantage of a richer code would favor breakup into smaller regions (encoding more amino acids), while the cost of copying errors and energy expense would push toward fewer, larger regions (and thus fewer amino acids). Based on this model, Tlusty says he found that 20 is an optimal number of regions, so nature’s choice of 20 amino acids wasn’t completely random.

Tlusty’s model is mathematically equivalent to the physics of oil and soapy water mixtures. In certain conditions, soapy membranes engulf the oil into tubes, and the tubes plug into each other, forming networks. These networks take on shapes that minimize energy and maximize entropy.

Tlusty says his theory also could apply to other biochemical codes that cells use to process information. For example, he says, it could offer some insight into the language of antigens, the molecules that prompt the immune system to produce particular antibodies.

However, comments Glenn Tesler, a mathematician at the University of California, San Diego, Tlusty’s paper is rather abstract and offers no concrete example of further applications. Still, Tesler adds, the results are interesting in that they tie together ideas from information theory, physics and biology.