Google Translator Toolkit Adds 16 Million Words to Wikipedia

Overall, 100 million words have been added using Google's tools

By on July 15th, 2010 07:56 GMT
Wikipedia is one of the largest sites on the web, yet its latest plans are more ambitions than ever. It wants to double its audience in the next few years, but for that it has to expand on the content already available. One way it can do this is by having more content in less used languages. And it’s getting some unexpected help with that from the Google Translate team.

“To help Wikipedia become more helpful to speakers of smaller languages, we’re working with volunteers, translators and Wikipedians across India, the Middle East and Africa to translate more than 16 million words for Wikipedia into Arabic, Gujarati, Hindi, Kannada, Swahili, Tamil and Telugu,” Michael Galvez, Product Manager at Google, wrote.

This isn’t a new endeavour, Google started in 2008 and focused on articles written in Hindi. The Hindi Wikipedia had only 3.4 million words in 21,000 articles at the time, significantly less than popular local versions of Wikipedia, not to mention the English-language version.

Using data from Google Search, the team determined which were the most popular English articles for Hindi speakers and focused on translating those. Using the Google Translator Toolkit, the team translated 100 English articles adding up to 600,000 words.

“In three months, we used a combination of human and machine translation tools to translate 600,000 words from more than 100 articles in English Wikipedia, growing Hindi Wikipedia by almost 20 percent. We’ve since repeated this process for other languages, to bring our total number of words translated to 16 million,” Galvez said.

And Google has no intention of stopping here. What’s more, people have been using the Translator Toolkit on their own to create more Wikipedia articles for their languages. Google says 100 million words have been added to Wikipedia by volunteers using its tools.

Comments