Providing a huge cache of structured data that is as useful to humans as it is to machines

Mar 31, 2012 14:11 GMT  ·  By

Wikimedia, the foundation behind Wikipedia, has announced one of its largest undertakings to date and the first new big project since 2006, Wikidata. Wikidata, as the name suggests, aims to be a Wikipedia for data, or, alternatively, a way for information stored by Wikimedia to act more like a database than an encyclopedia.

Wikidata will have a major impact on Wikipedia itself, but since all of the information in it will be available to all, it will have an impact in a huge variety of places, from search engines to scientific research.

Wikidata aims to take all of the factual information currently stored by Wikipedia and eventually a lot more, store it and present it in a structured way that is as easily readable by humans as it is by machines.

For example, Wikipedia already has birthdays for almost all of its biographical pages. But there's no way of finding out who was born on the same day, what notable painters were born in the same year, and so on.

Wikidata will solve this problem and more by acting more like a relational database. Of course, this being a Wikimedia project, not only will all of the data be available under a Creative Commons license, it will also be editable by anyone.

The project, spearheaded by the German "branch" of Wikimedia, is about to get underway and should be ready to go in about a year. In the first phase, until August 2012, all the links and pages containing the same info but on different language Wikipedias will be centralized.

This will ensure that the factual data is the same on all pages and that any updates affects all of them, decreasing the amount of work Wikipedia editors have to put in to maintain pages, one of the big goals of Wikidata.

In the second phase, editors will be able to add new data and edit the existing one. This is set to last until December 2012. After that, the data will be put to good use for creating automated lists and charts, which are now created and maintained by hand by editors. At that stage, Wikidata will be complete and should be able to operate as intended.

Wikidata starts with €1.3 million in funding. Half of that comes from the Allen Institute for Artificial Intelligence, created by Microsoft cofounder Paul Allen. A quarter of the funding comes from the Gordon and Betty Moore Foundation, created by the Intel cofounder and his wife.

Finally, the remaining quarter comes from Google, which has been a frequent contributor to Wikimedia projects. Google, of course, relies on Wikipedia for many of its searches and will rely a lot more on the structured data that Wikidata will provide.