Google Docs Adds Optical Character Recognition Support for 34 Languages

One of the cool features of Google Docs is the built-in optical character recognition (OCR). The feature enables users to upload scanned files or even images and convert them into editable, native documents. The feature was introduced last year and Google is now greatly expanding the number of languages supported with 29 new ones for a total of 34.

"Today, we’re happy to announce that we’ve added support for 29 additional character sets, including those used in most European languages, Russian, Chinese Simplified and some other Asian languages. See the upload page for the full list," Jaron Schaeffer, Software Engineer at Google, announced.

"Last June, we introduced the ability to upload documents into Google Docs using Optical Character Recognition (OCR)," he wrote.

"OCR analyzes images and PDF files, typically produced by a scanner (or the camera of a mobile phone), extracts text and some formatting and allows you to edit the document in Google Docs," he explained.

Now, if you select the "Convert text from PDF or image files to Google Docs documents," you will be able to select from the 34 languages to choose the original one used in the file.

Uploaded files are analyzed and the text within them is retrieved. Apart from the standard Latin alphabet, Google's OCR software in Docs now supports Cyrillic and Simplified Chinese scripts as well.

No OCR software is perfect, but things like scanned books or documents should be fairly easy to convert. Of course, the higher the resolution the better the results.

Along with support for the new languages, the already supported ones English, French, Italian, German, Spanish, have gotten an improvement in OCR quality Google says.

Additionally, the OCR software now does a better job at recognizing and preserving the original font as well as retaining as much of the formatting as possible.