Google has already announced that it will be adopting the latest version of the standard

Feb 4, 2012 14:51 GMT  ·  By

Google is celebrating the adoption of Unicode on the web and is also announcing that it's already switching to the very latest version available, Unicode 6.1 which was released on January 31st.

"We’ve long used Unicode as the internal format for all the text Google searches and process: any other encoding is first converted to Unicode. Version 6.1 just released with over 110,000 characters; soon we’ll be updating to that version and to Unicode’s locale data from CLDR 21 (both via ICU)," Google's Mark Davis announced.

The latest addition to the Unicode standard comes with 732 new characters, but also plenty of technical changes.

"Unicode was invented to solve that problem: to encode all human languages, from Chinese (中文) to Russian (русский) to Arabic (العربية), and even emoji symbols like or it encodes nearly 75,000 Chinese ideographs alone. In the ASCII encoding, there wasn’t even enough room for all the English punctuation (like curly quotes), while Unicode has room for over a million characters," Davis explained.

Along with announcing that it will be switching over to the latest standard soon, Google also provided an update on how Unicode is doing on the web.

Even though it was created at the same time as the web, it has been a long batter and it only started being adopted in a meaningful way in 2003, 2004. Even then, it didn't really take off until 2006.

After that, usage shot up, while the other popular scripts used, the basic US-only ASCII or the more extended Latin script, began to lose market share.

It was only in 2008 that Unicode became the most popular script on the web. Since then, it has continued to grow. It is now used by more than 60 percent of web pages. If you don't count ASCII as separate, since it is included in most other encodings, Unicode accounts for almost 80 percent of pages published online, as far as Google is aware.