Teaches its Books service to read by itself

Sep 17, 2009 10:02 GMT  ·  By

Today, September 16th, 2009, could be remembered as the day computers started manipulating humans and taught themselves to read. By acquiring reCAPTCHA, a renowned spam filter used on the Internet for more than two years, Google has breathed new life in its Google Books project.

As the official Google blog states, the reCAPTCHA system will most likely be used in the Google Books project to help word and character recognition on book scans. This is made possible thanks to the reCAPTCHA technology that uses real words coming from old newspaper scans for its classic CAPTCHA spam filter.

While other CAPTCHA systems use a word library to generate the spam image filters, reCAPTCHA has used captions of words that come from scanned old newspapers. The idea of this whole project came from Prof. Luis von Ahn at the Carnegie Mellon University, also a Google employee.

Mr. von Ahn joined Google in 2006 when the company opened new offices at the Carnegie Mellon University so it could perform a study and research on computer AI learning. The project enlisted web users to play certain games that were monitored by computer AI to learn from their decision-making process.

Started as a side project, reCAPTCHA has immediately spawned around the web firstly thanks to its easy interface and nice design (see attached screenshot of a reCAPTCHA system). After conquering blogs, forums and contact forms, the system has managed to develop on its OCR system (Optical Character Recognition), which was also later introduced in the Google Books project.

“reCAPTCHA’s unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR),” said Luis von Ahn, co-founder of reCAPTCHA, and Will Cathcart, Google product manager. “This technology also powers large scale text scanning projects like Google Books and Google News Archive Search.”

Practically, even if we like the Google Books project or not, we will indirectly build it from now on and watch Google's computers benefiting from every reCAPTCHA field we fill in, learning to read and correcting scanned words from its book archive.

reCaptcha can also be downloaded from Softpedia via this link.

Photo Gallery (2 Images)

Google acquires reCAPTCHA's spam filter
reCAPTCHA spam filter
Open gallery