A tool to discover and block Twitter spam

Aug 17, 2009 12:16 GMT  ·  By

The TwitBlock website was recently launched with an almost impossible goal: to discover, rate and block Twitter spamming accounts. Applying different detection / rating algorithms, the website analyzes account followers and displays suspicious accounts based on the obtained rating.

As it can be seen from the attached screenshot, followers are inspected and given a rating based on a complex algorithm. If the score is lower than 20, then they are not printed on the main page. If one of the followers scores more than 20 points, then they will be listed on the screen for further analyses.

Users will then be able to make a decision and mark an account as “not spam” (will not be subjected for other analyses in other scans) or could block the user. If an account is blocked by a user after a “follower scan,” then that account will be automatically taken into consideration for later scans and have its spam rating increased.

The TwitBlock scan method takes into account eight different testing algorithms. The first one is called the Ignore Factor and it determines the ration between persons followed, and the number of those persons following the user back. For every percent over 50, a point will be added to the spam rate.

The Follow Rate examines the number of days a user has been on Twitter and the number of accounts they're following. If the ratio is big, it's a sign of automation. Only a bot could add on a regular basis 100 accounts per day to their account, while a normal person would have a very low ratio. Adding more than ten new followers a day will result in extra spamming points.

If an account is Blocked by Others, it will automatically add a ten-point penalty for every user that has blocked it. Also, Identical Profile Pics are considered as a sign of bot accounts. TwitBlock scans account images for identical checksums and adds ten points for any image used more than once.

Tweets via API are also a sign of spam, since many bots use automated scripts that connect to the Twitter API and update statuses or interact with other users. A ten-point penalty is added for this too. Employing registered applications like TweetDeck will not affect the spam rating. The importance of this algorithm is considered to be very low, since almost every respectable website has a Twitter API connection put into place and many spam tweets appear as “from the web” and not from another source.

Spam robots are also known not to have any profile information. If a user or bot fails to fill the four account information fields (Bio, Location, image and URL), two points will be added to their spam rating. If the bot is configured to fill account information fields, then a second algorithm will apply that checks account information for spammy words.

It will check in the bio field and status updates for known spam keywords. These include promotional, marketing, financial and adult-related words.

Since the AI has not evolved that much, humans are still the only ones that can recognize a randomly generated username. A simple method to detect random usernames consists of checking for name patterns, vowels missing from the username, only number names, etc. If a username is considered not to have passed this test, ten points of penalty will be added.

There are known issues with business-related or big company accounts that will sometimes score high in TwitBlock because of internal policies regarding social networking updates or account-following rules. A whitelist has been created by TwitBlock admins for verified Twitter accounts that rank high due to their “follow back” policy (i.e. Barack Obama).

TwitBlock is currently in Alpha, an upgraded version of the website being in the works. It is not known if Twitter would like to affiliate its services and put a partnership in place with TwitBlock, but it might seem a good idea after the latest spam and hack attacks against its servers.

Photo Gallery (2 Images)

TwitBlock, an inspired username and logo
Screenshot of a TwitBlock spam scan performed on a Softpedia test Twitter account
Open gallery