The eventual goal is to store articles in pure HTML, but it's going to take a while

Mar 4, 2013 20:11 GMT  ·  By

Wikipedia became the powerhouse it is today by enabling anyone to contribute and making it easy for them to do so. Key to this, in 2001 when Wikipedia was launched, was the wiki markup used to write or edit articles.

It made it possible for people with no knowledge of HTML to create web pages with relative ease. These days, blogs are old fashioned, so that may not seem like such a big deal, but it was 12 years ago.

But the site grew in complexity quite a lot since it launched, so much so that the wiki markup, Wikitext, is now becoming an obstacle more than a useful tool.

Apart from asking people to learn a markup language, any markup language just to edit a simple article for the web, the language poses a technical problem, as pages have to be converted into HTML something that's getting slower as the articles get bigger.

Wikimedia Foundation, the group behind the popular site, is already working on a visual editor for Wikipedia.

But a bigger problem is converting the existing pages into pure HTML. Using an HTML storage solution would speed up editing as well as simplify storage.

It may seem straightforward, but there are big hurdles. In fact, it is impossible to fully convert the millions of articles hosted by Wikipedia without losing details.

There is hope though and that hope is called Parsoid, a tool developed by the Wikimedia Foundation to convert Wikitext to and from HTML.

For now, the tool, built for Node.js, is mainly used for the visual editor that Wikipedia is building. The main goal, for now, is to be able to support a parallel Wikitext/HTML storage system.

In the long term, Wikimedia is also looking at completely replacing Wikitext markup for storage in favor of a pure HTML solution.