Softpedia
 

NEWS CATEGORIES:



NEWS ARCHIVE >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
Home > News > Webmaster > Internet Life

September 7th, 2006, 10:27 GMT · By

Google Announces Tesseract OCR

SHARE:

Adjust text size:


Luc Vincent, Uber Tech Lead, has made an entry on the Google Code Blog revealing that the Mountain View Company plans to release an Optical Character Recognition (OCR) engine initially
developed by the Hewlett Packard Laboratories between 1985 and 1995, into open source.

"In 1995 it was one of the top 3 performers at the OCR accuracy contest organized by University of Nevada in Las Vegas. However, shortly thereafter, HP decided to get out of the OCR business and Tesseract has been collecting dust in an HP warehouse ever since," explained Vincent.

Vincent also described Google's collaboration with the Information Science Research Institute at UNLV in order to reincarnate the Optical Character Recognition engine into open source and the afferent debugging process that preceded the re-release of Tesseract OCR. The engine is designed to convert paper document pages into indexed text.

"A few things to know about Tesseract OCR: for now it only supports the English language, and does not include a page layout analysis module (yet), so it will perform poorly on multi-column material. It also doesn't do well on grayscale and color documents, and it's not nearly as accurate as some of the best commercial OCR packages out there. Yet, as far as we know, despite its shortcomings, Tesseract is far more accurate than any other Open Source OCR package out there," promised Vincent.

TELL US WHAT YOU THINK:

1,902 hits · Link to this article · Print article · Send to friend · Subscribe to news

MUST-READ RELATED ARTICLES:


Google Plans Third Chinese R&D Center

Google Under Fire in Brazil

Google Releases Writely Beta

Vista to Support Firefox and Thunderbird

UK Universities and Colleges Increasingly Prefer Mozilla

READER COMMENTS:



No user comments yet.
Be the first to express your opinion!
Copyright © 2001-2012 Softpedia. Contact/Tip us at

WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM