Researchers have come up with a new method of generating dictionaries between languages

Sep 30, 2013 12:48 GMT  ·  By

Computers are great at some things, much better than us mere humans, and bad at others that five-year-olds master with ease. Or, at least, that is the common wisdom. But things are changing. Translation is one of those areas where computers can't really compare to humans.

Google Translate is good for understanding the basic idea behind a piece of text, in most cases, but you can't really rely on in for anything important.

That's because computers aren't very good at understanding meaning, the nuance behind an expression, or even which definition of a word is appropriate in any given circumstance.

But Google is working on converting translation into something computers are great at, math, as MIT's Technology Review details. Google Translate is based on machine learning.

It "knows" a full set of curated dictionaries that translate words and common phrases between different languages and it also relies on existing human translations to derive statistical relations between phrases in different languages.

This means that Translate needs already translated documents to improve accuracy; the more data it's fed, the better it behaves. This becomes a big problem for languages which are severely underrepresented on the web.

But Google is now working on a method that doesn't need much human input, and it has published a paper on it. Researchers have devised a way of creating mathematical representations of the relationships between various words in one language, based on statistical analysis of text.

This representation can then be used to find correlations between words in two different languages, by applying the known relationships between words to the unknown language.

For example, if you map out the relations between "dog" and "animal," "pet" or "bark" in English and Spanish you'll get similar mathematical representations. So if you know that "dog" is "perro" in Spanish, you can deduce the translation for the other words based on the statistical relationships between the Spanish words.

This method can be used to quickly generate translation dictionaries on which more advanced machine learning techniques can be applied, eliminating much of the human work in the process. This works best for languages that have a common heritage, like English and Spanish. But it works well enough for any pair of languages, no matter how different, since there's always going to be a relation between "dog" and "animal," for example, in any language.