Robots that can see are nothing new, as they've been around for quite some time now. And we're not talking about machines such as the rover Spirit, which uses its cameras to inform its human operators about what it's doing, but about robots that are able to identify and move around an obstacle all on their own, by recognizing an object and associating it with a threat. European researchers have recently managed to combine such vision capacities with sound-recording abilities and have devised a way of integrating the two types of stimuli into a single, purposeful perception,
PhysOrg reports.
Computer vision is already being used in factories, where machines look at various objects and use visual-recognition software to determine what the targets are. Systems such as this one are used, for example, in microchip production, in the quality-control check-up stage. Cameras analyze all sides of the finished product and ensure that they correspond with the correct factory specifications. However, when pulled out of these specific situations, robots largely lose the ability to use their vision efficiently. When conversing with a human, for instance, they are unable to keep up with the changes they see.
Experts say that human interactions are precisely the type of actions that robots will be required to do in the future, and that endowing them with perception is fundamental to achieve that. “The originality of our project was our attempt to integrate two different sensory modalities, namely sound and vision. This was very difficult to do, because you are integrating two completely different physical phenomena,” Perception-on-Purpose project coordinator Radu Horaud explains. He adds that disciplines such as cognition and neuroscience could also benefit from the innovation.
The experts at POP essentially use light and sound to locate the direction of a voice or of a face. That is to say, the two senses “communicate with each other” and help each other in figuring out where the source of a stimulus lies. “It is not that easy to decide what is foreground and what is background using sound alone, but by combining the two modalities – sound and vision – it becomes much easier. If you are able to locate ten sound sources in ten different directions, but if in one of these directions you see a face, then you can much more easily concentrate on that sound and throw out the other ones,” Horaud adds.
“Most often, sound research is conducted in specialized labs, with arrays of microphones and a very controlled acoustic environment. But we integrated our two microphones and two cameras onto the head of our Popeye. The idea is to have an agent-centered cognitive system,” he concludes.