
Microsoft has announced updated information related to its Strider Search Defender, a tool meant to systematically and automatically identify spammers using non-content analysis.
It was designed in order to discover Web pages that act as a front for spam sites. The most common practices for search engine spamming are search engine optimization techniques and comment spamming posted automatically as random comments.
"To make their URLs look more legitimate so that search users are more likely to click the links, many spammers create doorway pages on reputable domains and use their URLs in comment spamming. When a user clicks on a doorway-page link in search listings, her browser is instructed to either redirect to or fetch ads listing from the actual target page, potentially operated by the spammer," said the Microsoft Research Strider Team.
Strider Search Defender will use two components to dig out spamming Websites. The Spam Hunter starts off with an initial list of confirmed spam URLs and harvests additional URL information constructing a list with potential spamming sites. Microsoft has taken into account that it may be confronted with the possibility of false positives slipping through the Spam Hunters filters as often spam and non-spam content is interlaced.
"To filter out false positives, we feed the list of potential spam URLs to the Strider URL Tracer (which we have previously released to help trademark owners find typo-squatting domains of their websites. The tracer provides a key functionality called the Top Domain view: given a list of (primary) URLs, the tracer launches an actual browser to visit each URL and records all secondary URLs visited as a result. At the end of the batched scan, the Top Domain view provides the list of third-party domains that received secondary-URL traffic and rank them by the number of primary URLs that generated traffic to them. If the input is a list of potential spam URLs, the Top Domain view essentially highlights those target-page domains that are associated with a large number of doorway-page URLs," stated the Microsoft Research Strider Team.
Strider is on its way to become fully automated and readied for implementation with the search engine in order to rule out the results returned to user searches that contain spam.