AI Startup Cracks CAPTCHA Codes with Human-Like Vision

Nov 13th, 2017 6:00am by Kimberley Mok

Artificial intelligence is getting more and more adept at things previously thought to be quite difficult for machines to master. But for the most part, machines still have a long way to go before they achieve human-like intelligence, yet not a week goes by without some relatively significant artificial intelligence breakthrough making the news.

Now, it seems that yet another barrier of sorts has been broken. San Francisco startup Vicarious has developed an AI that’s capable of reliably solving CAPTCHA, those automated security verification puzzles with squiggly or distorted lettering that you have to solve before being able to post a comment on a website, for instance.

Passing the Turing Test

CAPTCHA — which stands for “Completely Automated Public Turing test to tell Computers and Humans Apart” — has been used to thwart online fraud or spam bots since their invention in the 1990s. It’s a test that is based on an idea proposed by mathematician Alan Turing during the 1950s, and is supposed to distinguish human from machine, since they are designed to be relatively easy for humans, but difficult for machines, to solve.

However, as Vicarious’ research shows, it’s now possible to tweak computer vision technology in a way that it can more closely approximate how human vision (and intelligence) works, which enables computers to solve CAPTCHAs much like how a human might. To give you an idea of a benchmark, if an algorithm can solve a CAPTCHA at least 1 percent of the time, then it’s considered ineffective.

Of course, there are deep learning neural algorithms out there capable of decoding CAPTCHAs, but these previous instances have required a massive amount of training data to do so. In these cases, these models were trained on millions of images, such as actual strings of letters in specific fonts, rather than being able to “generalize” from a few training examples and recognizing individual letters under the distortions, as a human would be able to do.

Comparing results of parses between Recursive Cortical Network (RCN), a Convolutional Neural Network (CNN) and two Amazon Mechanical Turk workers.

Recursive Cortical Network

In contrast, Vicarious’ AI was able to solve CAPTCHAs by building upon a small set of training examples. To achieve this, the company developed a type of artificial neural network called a Recursive Cortical Network (RCN), capable of doing this leap in “generalized” learning.

“Recent AI systems like IBM’s Watson and deep neural networks rely on brute force: connecting massive computing power to massive datasets,” said Vicarious co-founder D. Scott Phoenix in a press release. “This is the first time this distinctively human act of perception has been achieved, and it uses relatively minuscule amounts of data and computing power. The Vicarious algorithms achieve a level of effectiveness and efficiency much closer to actual human brains.”

According to the company, the RCN was able to solve reCAPTCHAs (CAPTCHA tests consisting of scans of words from old books and newspaper, doubling as a way to digitize books) at an accuracy rate of 66.6 percent. That’s in comparison with BotDetect at 64.4 percent, Yahoo at 57.4 percent and PayPal at 57.1 percent. Compared to other deep learning models for text recognition, the RCN was able to achieve equal or higher accuracy, while using only a fraction of the training data, about 300 times less.

The RCN is able to achieve this streamlined efficiency this by imitating a “brain-like vision system” like the one found in humans.

“During the training phase, [the RCN] builds internal models of the letters that it is exposed to,” as Vicarious co-founder Dileep George explained on NPR. “So if you expose it to As and Bs and different characters, it will build its own internal model of what those characters are supposed to look like. So it would say, these are the contours of the letter, this is the interior of the letter, this is the background, etc. And then, when a new image comes in … it tries to explain that new image, trying to explain all the pixels of that new image in terms of the characters it has seen before. So it will say, this portion of the A is missing because it is behind this B.”

In addition to applying this model to figure out CAPTCHAs, the team has also used the RCN to identify multiple objects in images with random backgrounds, meaning that the technique could be utilized in other ways beyond the parsing of text.

This is a big step for an AI that can learn to make broader, human-like generalizations in what it sees, potentially enabling machines to better understand and therefore better interact and manipulate the world around them. But it also means that text-oriented CAPTCHAs will soon become ineffective, and better security verification measures will need to be developed, as machines become more human-like in their intelligence.

Images: Vicarious