The Tangled Roots of English

by Nicholas Wade, 'The New York Times,' February 25, 2015

Tracing a Mother Tongue

A newly revised family tree of Indo-European languages supports the theory that the root language, proto-Indo-European, originated in the steppes of eastern Europe.


Tracing a mother tongue

The peoples of India, Iran and Europe speak a Babel of tongues, but most — English included — are descended from an ancient language known as proto-Indo-European. Scholars have argued for two centuries about the identity and homeland of those who spoke this parent language, but a surprisingly sudden resolution of this longstanding issue may be at hand.

Many origins have been proposed for the birthplace of the Indo-European languages, but only two serious candidates are now under discussion, one of which assumes they were spread by the sword, the other by the plow.

Historical linguists can reconstruct many words of proto-Indo-European from their descendants. For example, there was probably a word “kwekwlos,” meaning wheel, which is the ancestor of “kuklos” in classical Greek, of “kakra” in Old Indic and – because K shifts to H in Germanic languages – of “hweohl” in Old English, itself the ancestor of wheel in modern English.

From the reconstructed vocabulary, the speakers of proto-Indo-European seem to have been pastoralists, familiar with sheep and wheeled vehicles. Archaeologists find that wheeled vehicles emerged around 4000 B.C., suggesting the proto-Indo-European speakers began to flourish some 6,500 years ago on the steppe grasslands above the Black and Caspian Seas. This steppe theory, favored by many linguists, holds that the proto-Indo-European speakers then spread their language to Europe, India and western China, whether by conquest or the appeal of their pastoral economy.

This theory was challenged by Colin Renfrew, a Cambridge archaeologist who proposed in 1987 that the languages had been spread by the Neolithic farmers who brought agriculture to Europe. Under this scenario, the homeland of proto-Indo-European was in Anatolia, now Turkey, and its speakers started migrating some 8,000 to 9,500 years ago.

Dr. Renfrew’s proposal carried weight because the expansion of farming peoples is an accepted mechanism of language spread, and the migration of Neolithic farmers into Europe is well documented archaeologically. Linguists objected that proto-Indo-European could not have fragmented so early because the wheel wasn’t invented 8,000 years ago, yet many Indo-European languages have related words for wheel that must be derived from a common parent. But Dr. Renfrew argued that, long after their dispersal, these languages could all have borrowed the word for wheel along with the invention itself.

The standoff between the steppe and Anatolian theories of Indo-European origin persisted until 2003. Two New Zealand biologists, Russell Gray and Quentin Atkinson of the University of Auckland, entered the fray with an impressive method of constructing datable trees of language descent. Historical linguists had drawn up trees of how proto-Indo-European had split into its daughter languages, based on sets of related words known as cognates. The word for water is “wasser” in German, “vatten” in Swedish and “nero” in modern Greek. The similar English, German and Swedish words are said to be cognates, derived from an inferred proto-Indo-European word “wodr,” but the “nero” of modern Greek is not.

Linguists had hoped that by comparing languages in terms of how many cognates they shared, the Indo-European tree could be dated. But after discovering that the rates of language change varied widely from one branch to another, they largely gave up.

Dr. Gray and Dr. Atkinson realized that statistical methods developed by biologists for tracking the evolution of genes and proteins addressed many of the problems that exist in reconstructing trees of language descent. They represented each Indo-European language as a string of 1s and 0s, depending on whether it shared cognates for a list of words known to resist change. They then computed the likeliest of the many possible trees that would give rise to the observed data.

Their preferred tree of Indo-European languages had the same shape as that constructed by historical linguists. But its lower branches could be dated from historical events like the split between Latin and Rumanian when Roman troops withdrew south of the Danube in A.D. 270. And with the lower branches anchored in time, they could date the root. Proto-Indo-European, they calculated, was spoken 7,800 to 9,800 years ago.

That conclusion provided striking support for the Anatolian theory. Dr. Gray and Dr. Atkinson, with Remco Bouckaert and colleagues, dropped a second shoe in 2012 when they applied to the dispersal of proto-Indo-European a statistical model developed to track the geographical spread of viruses. It showed “decisive support for an Anatolian origin over a steppe origin,” the authors concluded in an article in Science.

It seemed that with the biologists’ help, the archaeologists’ Anatolian theory had triumphed over the linguists’ steppe hypothesis. But two findings reported this month have abruptly tilted the weight of evidence toward the steppes.

Though some linguists had dismissed the Gray and Atkinson result, others realized their computational approach had much to offer. Andrew Garrett, a linguist at the University of California, Berkeley, has teamed up with Will Chang, a linguist trained in computational techniques. They and colleagues noticed that in the 2012 article by Dr. Bouckaert and others, in eight cases where an ancient language is the widely assumed ancestor of a modern one, the modern language is shown as being descended from a hypothetical cousin of the ancient language.

For example, the Romance languages are assigned to a hypothetical cousin of Latin, not Latin itself, and English to an inferred cousin of Old English.

Dr. Garrett and Mr. Chang thought it would be more realistic for the tree to adopt generally accepted language ancestries, even though this required overruling its probability calculations.

Origins of an Ancient Language

Researchers place the homeland of the proto-Indo-European language, the ancestor of many modern languages spoken across Europe and Asia, in either the steppes north of the Black Sea or in Anatolia, modern Turkey.


When the Bouckaert tree was forced to adopt the eight accepted language ancestries, Dr. Garrett and Mr. Chang and colleagues report in the journal Language, the whole tree shrank in age and its root stepped down to 6,500 years old, in agreement with the steppe hypothesis of Indo-European origins.

A second boost for the steppe theory has emerged from the largest study of ancient DNA in Europe, based on analysis of 69 people who lived 3,000 to 8,000 years ago. Patterns in the DNA bear evidence of a migration into Germany some 4,500 years ago of people from the Yamnaya culture of the steppes, the first to develop a pastoral economy based on wagons, sheep and horses. So extensive was this migration that three-quarters of the ancient people sampled in Germany bear Yamnaya-type DNA, says a team led by Wolfgang Haak of the University of Adelaide, Australia, and David Reich of Harvard Medical School. Their report was posted this month on bioRxiv.

If so much of the population was replaced, the newcomers’ language probably prevailed, and the migration plausibly represents an expansion of Indo-European speakers from the steppes. “These results provide support for the theory of a steppe origin of at least some of the Indo-European languages of Europe,” the authors say.

The three oldest branchings of the Indo-European tree, according to Don Ringe, a historical linguist at the University of Pennsylvania, are first, languages such as Hittite once spoken in Anatolia; second, Tocharian, a language group of western China; and third, the Italic and Celtic language groups of Europe. Archaeological evidence attests migrations out of the steppe in these directions in the right order, say Dr. Ringe and David Anthony, an archaeologist at Hartwick College, writing in the Annual Review of Linguistics.

They also note that proto-Indo-European has borrowed words from proto-Uralic, the inferred ancestor of languages such as Hungarian, Finnish and Estonian, and from languages of the Caucasus. A location in the steppes, but not in Anatolia, would make such borrowings geographically plausible. The evidence for a steppe origin of the Indo-European languages “is so strong that arguments in support of other hypotheses should be re-examined,” Dr. Ringe and Dr. Anthony say.

But the case is not yet closed. The two new pieces of evidence, Dr. Garrett’s correction of the Bouckaert tree and the ancient DNA data, may not be as conclusive as they seem.

Dr. Renfrew, the author of the Anatolian hypothesis, considers it a “strong possibility” that the migration from the steppes to Europe recorded in ancient DNA may be a secondary phenomenon. In other words, Indo-European could have spread first from Anatolia to the steppes and from there to Europe.

And the biologists who draw up statistically probable language trees do not believe the Garrett team is justified in making the trees conform to ancestry constraints. “The Garrett and Chang model is overzealous in forcing ancient languages to be directly ancestral – the data don’t support this,” said Dr. Atkinson, referring to new tests he has done.

One reason is that written languages tend to be fossilized, said Paul Heggarty, a linguist at the Max Planck Institute for Evolutionary Biology: Living languages are likely to be descended from a spoken language that diverged from the written version.

“The seemingly innocent assumptions which Garrett introduces,” Dr. Renfrew said, “turn out not to be so uncomplicated.”

Posted .

Filed under Demography.

Comments are disabled on this page.