A prototype developed by Microsoft Research

Dec 9, 2009 12:58 GMT  ·  By

EntityCube a new project from Microsoft Research is designed to centralize information on web entities, as the Redmond company explained. Specifically, EntityCube, referred to as Renlifang in China, is a highly specialized search engine, that not only allows end users to introduce queries on people, locations and organizations, but is also capable of extracting relevant information on certain topics from billions of web pages indexed, and generate summaries, allowing for exploration of their relationships. The project is extremely young, and only went live last week; however, it already shows a great deal of promise, being sufficiently intelligent to harvest data on web entities and arrange it into comprehensive portions designed to be easily served to the users.

“EntityCube is a research prototype for exploring object-level search technologies, which automatically summarizes the Web for entities with a modest web presence,” Microsoft revealed. “The need for collecting and understanding Web information about a real-world entity (such as a person or a product) is mostly collated manually through search engines. However, information about a single entity might appear in thousands of Web pages. Even if a search engine could find all the relevant Web pages about an entity, the user would need to sift through all these pages to get a complete view of the entity.”

At this point in the development, EntityCube has indexed information from just three billion pages, Microsoft notes. In this regard, the project might not provide all the information on certain web entities, although it might exist published in the wild, because the webpages where it was published haven’t been crawled yet. The software giant also indicates that because it is still in prototype phase, EntityCube might contain erroneous information in terms of names and relationships, that name disambiguation continues to be an issue, and that certain features such as summarization, are limited to searches involving people.

“Please note that we are still working on improving the accuracy of the key machine learning problems including entity extraction, name disambiguation, entity ranking, and relationship extraction, as well as looking at a better way of incorporating user feedback,” Microsoft added.