New algorithms devised in Germany

Aug 15, 2009 08:56 GMT  ·  By

A collaboration between German and British experts, from the Leipzig Max Planck Institute for Human Cognitive and Brain Sciences, and the Wellcome Trust Center for Neuroimaging, respectively, has recently yielded a new class of algorithms that their creators hope will make computers able to recognize spoken language faster and better. They say that the new mathematical models could significantly improve the automatic recognition and processing of speech, and that they could be employed in everything from ATMs to vocal recognition security units.

There are many factors that hinder word recognition in today's communications computers. For instance, if background noise exists, chances are that automated telephone systems will not be able to understand the words you are saying. Also, if the enunciation is a bit off, or if the speakers talk too loud, too soft, or too fast, most words elude the processors. This is mostly because these systems rely on recognizing characteristic features in the frequencies of voices, in order to determine the words.

“It is likely that the brain uses a different process. Many perceptual stimuli in our environment could be described as temporal sequences,” explains Leipzig Max Planck Institute for Human Cognitive and Brain Sciences expert Stefan Kiebel. He believes that temporal sequences analysis plays a fundamental role in word recognition. The expert classifies speech in “temporal levels.” For instance, the course of a conversation is divided from basic levels, such as letters, to larger ones, such as the topic itself. “The brain permanently searches for temporal structure in the environment in order to deduce what will happen next,” the team explains.

Kiebel argues that, in a conversation about summer, the letters “s” and “u” are more likely to be used for instance in the word “sun” than in the word “supper.” This could help computers process ahead of the conversation, in very much the same way a brain does. Of course, the new system has flaws, in that people who change subjects very often could confuse the algorithms and cause misunderstandings.

“The crucial point [of the new system], from a neuroscientific perspective, is that the reactions of the model were similar to what would be observed in the human brain,” Kiebel adds. The innovation provides new ways to approach the field of artificial speech recognition, the researchers add, and may be implemented widely within the next few years.