Feb 10, 2011 12:00 GMT  ·  By

Researchers in the United States are proposing a new explanation for why words that are used frequently are shorter than words that are used on fewer occasions. The idea goes against the most prominent theory developed to date on this issue.

For decades, language scientists have believed that often-used words are shorter because people tend to strive for efficiency in communication. Replacing “the” with a longer word, such as “phonetics” or “embezzlement,” would never contribute to getting your point across fast.

In emergency situation, being able to give short commands, such as “go,” “move,” “stay” or “stop” could mean the difference between life and death. Thus far, this theory has provided us with the explanations we needed to hear.

But experts at the Massachusetts Institute of Technology (MIT) propose a new approach for explaining the length of words. Cognitive scientists at the Institute say that a word's length reflects the amount of information it contains.

“It may seem surprising, but word lengths are better predicted by information content than by frequency,” explains MIT Department of Brain and Cognitive Sciences (BCS) PhD student Steven Piantadosi.

He is also the lead author of a new paper detailing the idea, which was published in last month's issue of the esteemed journal Proceedings of the National Academy of Sciences (PNAS). His paper evaluates words used commonly in 11 languages.

“It makes sense that if you say something over and over again, then you want it to be short. But there is a more refined communications story to be told than that,” the MIT scientist explains.

“Frequency doesn't take into account dependencies between words,” he goes on to add, adding that the original theory about word length was published in the 1930s by Harvard scholar George Zipf.

The approach Piantadosi took is based in the highly-influential work of former MIT information-theory pioneer Claude Shannon. The new study revealed that 10 percent of the variation in word length is attributable to the amount of information contained in those words.

Though small, this percentage is 3 times higher than the variation in word length attributable to frequency, the way Zipf described it. The new conclusions are bound to elicit a lot of controversy within dedicated circles in the international scientific community.

“This is exciting work. The notion of monkeys on a typewriter can’t explain these findings,” adds University of California in San Diego (UCSD) Department of Linguistics assistant professor Roger Levy, who was not a part of the study.