The algorithms describe what's happening in the video

Jun 1, 2010 09:04 GMT  ·  By
Vide surveillance could become a lot more efficient with the implementation of the new algorithms developed at UCLA and ObjectVideo
   Vide surveillance could become a lot more efficient with the implementation of the new algorithms developed at UCLA and ObjectVideo

A group of experts from the University of California in Los Angeles (UCLA), working in collaboration with scientists from the Reston, Virginia-based ObjectVideo, have recently finished developing a new computer vision system. The prototype demonstrator features a function that has never before been seen on a similar system. The researchers say that it features algorithms that can identify what's going on in the live video feed, and then create lines of text detailing the events. The team says that this could help authorities make more sense of the vast amounts of data their cameras collect.

With this system, it may also become a lot easier to search for specific terms inside video feeds, rather than having to look at all of them all over again. The prototype is not yet ready for commercial applications, the group admits, but says that only minor tweaks remain to be done on the device. “You can see from the existence of YouTube and all the other growing sources of video around us that being able to search video is a major problem,” explains the lead researcher of the new investigation, UCLA professor of statistics and computer sciences Song-Chun Zhu. “Almost all search for images or video is still done using the surrounding text,” he explains, quoted by Technology Review.

The scientist says that the new I2T (Image to Text) system relies on a series of computer vision algorithms, which take individual images, or frames from video streams, analyze them, and then list a summary of the features they discovered. Zhu and colleagues Benjamin Yao and Haifeng Gong, both at UCLA, were the main developers for I2T. The text produced by the algorithms can then be indexed, and the whole texts stored in a database. “That can be searched using simple text search, so it's very human-friendly,” Zhu adds.

The researchers imagine a future when systems such as their own become advanced enough to recognize a vast array of images almost instantly. They also say that feeding the texts the algorithms produce into a vocal synthesizer could enable a host of other applications. Law enforcement officials could, for example, be told while in pursuit where the subject is, as CCTV cameras across the town work together in recognizing and monitoring the threat.