By mixing soapy water, oil and the theory of information, a
physicist has found a possible clue to the origin of the genetic code, as well
as to the structure of other biochemical languages.
Life’s workhorse molecules are made from only 20 different
types of amino acids, encoded in the chemical makeup of DNA. In principle, DNA
could code for about three times that many, 64 possible combinations. Comparing
the genetic code with the physics of soapy water suggests an explanation for
why nature chose 20 as an optimal number, Tsvi Tlusty of the Weizmann Institute
of Science in Rehovot, Israel, reports in an upcoming Proceedings of the National Academy of
Sciences.
Genes are segments of DNA that encode instructions for constructing
the molecules, primarily proteins, needed to build and operate cells. Each gene
is a long sequence of “letters” — A, C, T and G — symbols for the chemical
bases adenine, cytosine, thymine, and guanine. Each three-letter combination
specifies an amino acid. But the code is redundant, meaning that sometimes different
triplets represent the same amino acid — for example, CAA and CAG both
represent glutamine.
The genetic code presumably evolved from the diverse and
chaotic chemistry of the Earth’s primordial broth. Before settling on the 20 standard
amino acids, the developing code faced opposing pressures. Organisms with a
more complex molecular language — using more than 20 types of amino acids —
could have deployed a wider range of chemical combinations to adapt to
environmental changes. But organisms with simpler chemistry required less
molecular machinery and energy, Tlusty explains. And using fewer amino acids
reduces the rate of random errors in copying genetic information: If several
triplets have the same meaning, there’s a good chance that changing one letter
will have no consequences.
Eventually the code reached an optimal level of richness, which
provided flexibility without being too high-maintenance. Such a balancing act, Tlusty
says, is similar to how certain physical systems tend to make arrangements that
minimize energy while maximizing entropy (a measure of disorder).
To put the analogy on more solid footing, Tlusty made a
physics-inspired mathematical model of the genetic code. He first represented the
code as a network in which each node stands for a three-letter word. Two nodes
are connected if they differ by just one letter. Tlusty then “colored” the
nodes, assigning the same color to triplets that encode the same amino acid.
The coloring partitioned the network into regions.
In the early days of evolution, the boundaries of these
regions would have shifted around before finding an optimal configuration. The
competitive advantage of a richer code would favor breakup into smaller regions
(encoding more amino acids), while the cost of copying errors and energy
expense would push toward fewer, larger regions (and thus fewer amino acids). Based
on this model, Tlusty says he found that 20 is an optimal number of regions, so
nature’s choice of 20 amino acids wasn’t completely random.
Tlusty’s model is mathematically equivalent to the physics
of oil and soapy water mixtures. In certain conditions, soapy membranes engulf
the oil into tubes, and the tubes plug into each other, forming networks. These
networks take on shapes that minimize energy and maximize entropy.
Tlusty says his theory also could apply to other biochemical
codes that cells use to process information. For example, he says, it could offer
some insight into the language of antigens, the molecules that prompt the
immune system to produce particular antibodies.
However, comments Glenn Tesler, a mathematician at the University of California,
San Diego, Tlusty’s
paper is rather abstract and offers no concrete example of further
applications. Still, Tesler adds, the results are interesting in that they tie
together ideas from information theory, physics and biology.
Found in: Genes & Cells, Mathematics and Physics