The script relies on the HTML 5 Canvas to extract pixel data

Jan 26, 2009 10:51 GMT  ·  By

A GreaseMonkey userscript is able to solve simple CAPTCHA by using a purely JavaScript-based neural network implementation The intriguing piece of code runs locally and has been written by Shaun Friedle to subvert the CAPTCHA check of the Megaupload service.

An artificial neural network is a computational model used to analyze behavior  in computational science. As opposed to a linear model, this one implies interconnected data processing artificial “neurons,” which establish relationships between input and output, and determine patterns.

GreaseMonkey is a powerful extension for Mozilla Firefox. It allows users to create local scripts written in JavaScript in order to alter the appearance and functionality of the web pages rendered within the browser. Common uses include blocking ads or certain unnecessary page elements from being displayed, thus resulting in a more resource-friendly and clean browsing experience.

Mozilla's JavaScript Evangelist and jQuery creator John Resig has reviewed the JavaScript implementation of a neural network used for OCR (optical character recognition) in this GreaseMonkey script, and has concluded that “It holds a lot of potential.” “I'm absolutely expecting some interesting work to be derived from this project,” Resig says.

The Megaupload CAPTCHA system that the script has been designed to bypass is not particularly strong in the first place. It is formed by three letters of the same font, but colored differently. Therefore, the first step, as Mr. Resig explains, is using the HTML 5 Canvas getImageData API to extract the pixel data from the CAPTCHA image and then convert it to greyscale.

Because each letter is of a different color, it is also easy to establish three individual pixel matrices, one for every character. Further filtering is performed by JavaScript functions to detect and drop the noisy pixels, as well as detect and refine the edge of the character.

The resulting pixel matrix is then used to create the input for the neural network. “A number of strategically-chosen points are then extracted from the matrix in the form of 'receptors.' For example, a receptor might be to look at the pixel at position 9x6 and see if it's 'on' or not,” John Resig comments. In total such 64 boolean values are determined and fed to the neural network, which then compares them with some pre-defined ones determined by the author from previous tests performed on all letters.

Based on the results of the comparison, the network assigns a matching percentage for each letter in the alphabet, and then uses the one with the highest value. The result is not 100% perfect, but it is slightly better than a direct pixel-to-pixel comparison. Mr. Resig notes that it is “pretty amazing considering that it's all happening 100% in the browser using standards-based technology.”

Unfortunately, this doesn't stand a chance against more complex CAPTCHA systems, but it can serve as a starting point for other more refined implementations. “I consider myself only mediocre in my ability to use javascript and to use neural networks,” Shaun Friedle, the author of the code, writes. “I'm sure it could be improved a lot by someone who really knows what they're doing,” he also concludes.