Searching for the Tree of Babel

Linguistic evolution may shed light on history

A picture is generally valued at 1,000 words. What might be the worth of an image of the 7,000 or so languages now spoken in the world? Scientists searching for patterns within this cacophony of lingoes are convinced that languages hold pivotal clues to questions about human history that other areas of study have been unable to answer. In their quest to demonstrate this new idea, these scientists are finding themselves in stiff debate with others who argue that the approach amounts to barking up the wrong tree.

FAMILY DISCUSSION. Part of Pagel’s Indo-European tree, with Greek near the base, shows how different languages may have diverged from one another. Adapted from Pagel.

The controversial approach treats languages as though they were biological species and applies analytical methods developed by evolutionary biologists. Although linguists previously have created trees of languages, they haven’t used computational methods to rapidly reconstruct relationships between large groups of languages.

Anthropologists and other investigators are using their new, more extensive language trees to trace the historical relationships of different cultural groups, from people conversing in Gujarati and Hindi to those speaking Navajo and Quecha. These researchers claim that, with that information in hand, questions about migration patterns, agriculture, and other society-changing practices become answerable in new ways.

Language trees are useful for depicting relationships of communities in the past 5,000 to 10,000 years or so, a period too short to be resolved by genetics–and exactly the time for which anthropologists and archeologists are seeking new streams of data.

Linguistic species

When biologists build family trees among species, they look for shared characters–such as the vertebrate spine–or specific genetic sequences. Species with the greatest similarities are grouped to create a tree branch with several extensions. Then, those branches that share the most characters are put together into a bough. This tree of hierarchical relationships, known as a phylogeny, traces a path from ancestral species at the trunk to the most recently evolved species out on the twigs.

Charles Darwin alluded to the notion that languages evolve and diverge as species do. Like genetic systems, which are made up of nucleotides, genes, and individuals, says Mark Pagel, an evolutionary biologist at the University of Reading in England, languages have discrete units: letters, syllables, and words. Language, like a set of genes, is generally transmitted from parents to offspring. And just as mutations in DNA provide the basic biological variations on which natural selection thrives, changes also occur in languages. Variations in pronunciation or meaning are either rejected or preserved in the transfer of language from parents to their children.

Though natural selection per se doesn’t act on new word variants, a form of cultural selection certainly does, says Pagel. For example, a catchier version of a word, such as aeroplane rather than flying machine, is more likely to persist.

For many decades, linguists used a tree approach, says Pagel. Comparisons, however, had generally been limited to a small number of languages, and the language analysts didn’t take advantage of computer-based quantitative methods.

Russell Gray, an evolutionary biologist at the University of Auckland in New Zealand, notes that for as few as 10 languages, there are an astonishing 34 million possible trees that can be drawn. “For over 100 languages, you’re talking about more possible trees than there are atoms in the universe,” he adds. Now, Gray says, it’s becoming possible to churn out trees for very large data sets.

“These methods are entirely appropriate,” concurs Colin Renfrew, an archaeologist at the University of Cambridge in England. “Given that historical linguistics uses many discrete pieces of information, quantitative and technical methods of this sort are long overdue.”

To test the mettle of the language-tree approach, researchers have been building hierarchies for Pacific islanders, sub-Saharan Africans, and Eurasians from Iceland to Bangladesh.

Branching Bantu

Clare Janaki Holden, an anthropologist at University College London in England, has used the phylogenetic method to produce a tree of 75 Bantu languages, which are spoken in the southern half of Africa.

Holden set out to examine whether a language tree might reflect the broader cultural history of the region, specifically the spread of farming. This is a good test of the approach, she says, since scientists using archaeological methods have already outlined the diffusion of agriculture in the region.

Holden used a preexisting data set of 92 words of basic vocabulary found in all 75 languages. These are words, such as man or hand, that are essential in all languages. Such words are thought to evolve slowly and be unique to a language.

The data were analyzed with computing software that groups languages so that those sharing the most words are deemed the most closely related. The tree that this effort produced largely agreed with previous linguistic work, says Holden. One difference was that a group of East African languages appeared in her tree closely related to some found in more southerly areas.

Holden and her colleagues at University College are now using linguistic trees to test theories about cultural traits. By mapping traits, such as farming or marriage practices, onto language trees, these researchers can find out how many times a practice evolved and whether it might be correlated with other genetic or cultural factors.

In research chronicled in the April 22 Proceedings of the Royal Society of London B, Holden compares the evolutionary scenarios suggested by her language trees with published archeological scenarios for the spread of farming in Africa from 5,000 to 1,500 years ago. The archaeological record suggests major African migrations of farmers. The first was a southerly spread of Neolithic crop farmers from western Africa into central-African forests. In the second mass migration, cattle farmers streamed south from Lake Victoria in eastern Africa.

Each of the two major language groupings in Holden’s tree is spoken in areas inhabited by descendants of people who followed one of these two migrations. The languages “mirror closely the spread of farming for both these western and eastern streams,” says Holden.

Pacific diaspora

On the opposite side of the planet, in Auckland, Gray has been using similar methods to produce an Austronesian language tree. This group comprises about 1,000 languages spoken by 270 million people across the Pacific. Gray and his coworker Fiona Jordan, an anthropologist now at University College London, are using their tree to test hypotheses about the timing and sequence of colonization in the Pacific islands.

The researchers created a tree via a process similar to that of Holden. However, the data set–5,185 words from 77 Austronesian languages–was not confined to basic vocabulary.

“What we found was very congruent with how most linguists would group the languages,” says Jordan. A few languages from close-in islands, however, did appear grouped with languages spoken on islands much farther out in the Pacific. This may be due to lingual complexities created by terms absorbed from other languages, says Jordan.

Jordan and Gray have considered a hypothesis, supported by archaeological evidence, regarding the colonization of Pacific islands. Around 6,000 years ago, farmers from Taiwan and southern China may have migrated 10,000 miles over water from Taiwan to western Polynesia in just 2,100 years. Known as “the express train to Polynesia,” this controversial idea was proposed in 1988 by Jared Diamond of the University of California, Los Angeles School of Medicine.

Gray and Jordan tested the scenario. If the theory is correct, says Jordan, languages found nearest to mainland Asia would show up as the lowest branches of the tree. Languages spoken on islands sequentially farther out would appear in correspondingly higher branches.

Jordan and her colleagues used statistical methods to map the proposed migrations onto the language tree. They found that the languages of islands near Asia split off on lower boughs of the tree than did languages spoken on islands farther out. The result was a nearly optimal fit, says Jordan. “It would require a very different tree to disagree with the express train,” she adds.

“The archaeological evidence shows a clear historical pattern” of how Pacific people spread, says Patrick V. Kirch, director of the Hearst Museum of Anthropology at the University of California, Berkeley. “When we get similar results from archaeology, traditional linguistics, and now this, it tells us we’re really onto something.”

Though the number of researchers applying phylogenetic techniques to languages is small, the idea is spreading. At a conference last March, Pagel presented his team’s ongoing study, which is using complex models of evolution to build trees. His preliminary results support many of the existing theories of relationships among the Indo-European languages.

The tree shows some ancient linguistic splits that would be difficult to reconstruct using traditional linguistic-tree building, says Pagel. For example, Greek appears to be one of the first languages to branch off the European bough. “Everything we know about archeology tells us [Greek] is very old,” he says, “combined with the fact that no one else can understand a word of it and that it has a different alphabet.”

Trees or nets?

Despite the apparent success of the method so far, many academics are cautious about examining languages by using methods developed for biological species. They point out an important difference. Biological traits only rarely transfer between individuals of the same generation or unrelated lineages.

Although small amounts of DNA move between species, languages undergo far more mixing. For example, English is Germanic in origin, but the Norman invasion of England in the 11th century resulted in many French terms joining the language. Similarly in recent times, Japanese has acquired many English words, including commercial and technological terms. This is akin to a lineage of bears somehow acquiring the beak of a duck.

Most “species by definition can’t borrow evolutionary features . . . while in linguistic or cultural contexts, such borrowings are perfectly possible,” says Scott MacEachern, an archaeologist at Bowdoin College in Brunswick, Maine. The language-tree researchers assume isolated communities, continues MacEachern. This is not how groups normally behave, he says.

There’s no reason why language, genes, and culture should evolve in the same ways, agrees John E. Terrell, an anthropologist at the Field Museum in Chicago. “There is nothing equivalent to genetic isolation in languages,” he says. Because languages frequently transfer words or phrases between lineages, their relationships might more accurately be depicted as a net than a tree.

However, he concedes, applying the biological phylogenetic approach to languages “can be used to produce a first approximation [of lingual relationships] as long as you never lose sight that it’s a quick-and-dirty technique.”

Language trees may become more quickly accepted for specific sorts of broad-brushstroke studies, such as questions of large-scale colonization over long periods. Although many “anthropologists are horrified at the thought of treating cultural groups as bounded units evolving through time,” says Monique Borgerhoff Mulder, an anthropologist at the University of California, Davis, they’re “missing the scale of the question.” She argues, “Specific mechanisms of social change are not relevant at this scale.”

Even those who advocate phylogenetic methods to build language trees admit that the sharing of words between languages is a problem. However, some words are subject to less exchange than others. Pagel suggests that by avoiding technological terms and other language elements that are frequently transferred, the language-tree method can become more useful.

The technique holds too much promise to dismiss, says Gray. He points to its potential to foster synergism among biology, anthropology, archaeology, and linguistics. “Instead of different disciplines thinking that they have the golden bullet . . . we need to tie everything together,” he says.


John Pickrell is a freelance writer based in Sydney and the author of Flames of Extinction: The Race to Save Australia’s Threatened Wildlife.