Twitter Details the Technology Behind Spelling Suggestions

May 13, 2012 12:01 GMT  ·  By

It's hard to notice it, but search is hard. You can get anything you want on Google in less than a second, but behind this stands a decade and a half of work.

Twitter knows this too well, granted the site is a lot smaller than the whole internet Google, Bing and the others have to scour, but it has some unique challenges, like the fact that things change so fast, that there's new stuff added all the time and so on.

But it has some common challenges as well, such as doing spell corrections to ensure that the results are for what the users wanted to find not for what they typed.

This too, though, is harder for Twitter than for most since people care less about grammar on Twitter than they do on the web. Which is to say, not a lot.

The 140-character limit means a lot of people use shorthand and the community is always creating new jargon. Not to mention the hashtags and the fact that Twitter is popular across the world and in a lot of languages.

"To address all of these issues, on top of our context-based mechanism, we also index dictionaries of trending queries and popular users that are likely to be misspelled, and use Lucene's built-in spelling correction library (tweaked to better serve our needs) to identify misspelling and retrieve corrections for queries," Twitter explained on its engineering blog.

Twitter recently added spelling correction, similar to Google's "did you mean," as well as related searches to its site and its mobile apps. It's been two weeks since and already, the new system, has corrected 5 million queries and provided suggestions for 100 million. Twitter provides a more technical description of the system on its blog, if you're interested in the nitty gritty.