[Accessibleweb] Internet.com: CAPTCHAs for Social Good?
M. Zoe Holbrooks
zoeholbr at u.washington.edu
Fri Oct 26 15:54:30 PDT 2007
Internet.com - October 22, 2007
CAPTCHAs For Social Good?
By Susan Kuchinskas
Researchers at the University of California at San Diego have a plan
to meld the brains of Internet users into a vast human grid that would
make use of the seconds wasted on solving CAPTCHAs (define) to enact
Likely familiar to any frequent Web user, CAPTCHAs are those
difficult-to-see images comprised of squiggly letters and lines
designed to confound blog spam bots and the like. Blogs and online
forums typically use codes hidden in CAPTCHAs to prove that a poster
is a human, rather than an automated program; ideally, a human user
can see and enter a CAPTCHA's hidden code, while an automated program
While finding a hidden CAPTCHA code may take only a couple of seconds,
when multiplied by the millions of other Internet users also
responding to CAPTCHAs, those seconds can add up to hundreds of wasted
The Soylent Grid project wants to apply those wasted seconds to
identifying images for assistive technology applications.
The project, named in reference to the 1973 Charlton Heston film
Soylent Green and its famous phrase, "Soylent Green is people!", is
already well on its way toward developing ways to make use of time
that would normally be spent on CAPTCHAs.
Soylent Grid's first application to harness tiny bits of Internet
users' attention is GroZi Shopping Assistant, a program that helps
visually impaired people with the difficult task of locating objects
A joint mission between the California Institute of Telecommunications
and Information Technology (CalIT2) and UCSD's Computer Science and
Engineering departments, GroZi would use the Soylent Grid project to
funnel to Web users images taken by visually impaired people, who can
then identify the objects in those images.
GroZi relies on a wearable system with a camera and tactile/haptic
feedback, a blind-accessible interface and computer vision-based
object recognition software.
GroZi's Human Need
But without Soylent Grid's human factor, GroZi faces difficult
technical hurdles. Recognizing content in digital images has long been
a nut difficult for computer science to crack. The human brain, on the
other hand, is superb at recognizing content in images, knowing
immediately which object in a family photo is Uncle Sean and which is
the family dog.
For the GroZi prototype, it took developer Michele Merler, now at
Columbia University, weeks to input the 120 products found in a single
45-minute video. To make the system truly useful, however, GroZi would
need to be able to decipher the staggering array of items available in
modern stores within seconds.
Enter Soylent Grid. Instead of building an image database item by
item, the project could take advantage of time spent identifying
CAPTCHAs. In such a scenario, the system could test a user attempting
trying to post on a blog by asking them to decipher a GroZi photo
instead of a traditional CAPTCHA.
The idea is to do this in real time, so that a visually impaired
person at a grocery store could use GroZi to tell the corn niblets
from the creamed corn.
"The currently used types of CAPTCHAs are a complete waste once they
go stale," said Stephen Belongie, the UCSD professor who heads the
project. "They're totally artificial, and when hackers crack them,
their approach is invariably 'hacky' and neither reveals any insight
into human object recognition nor does it do any good for society as a
While efforts in other industries are being made to improve image
recognition -- search engines, for example, are interested in
image-recognition technology to improve their search results -- a
human-powered system like the GroZi-Soylent Grid effort could vastly
improve the lives of the blind and vision-impaired.
Researchers outlined the benefits (PDF file) of Soylent Grid earlier
this week in a paper presented at the Interactive Computer Vision 2007
conference in Rio de Janeiro.
Soylent Grid, GroZi, and CAPTCHAs
For a Soylent Grid/GroZi combination to make an impact, however, the
service would need to partner with one or more online entities that
make heavy use of CAPTCHAs, such as blogging platforms or social media
For example, the Soylent Grid team estimates that Digg users could
identify an image approximately every 17 seconds. That's far from fast
enough for someone hurrying through their shopping. Belongie estimates
that five seconds would be an acceptable turnaround time, so GroZi
would need 25 times the CAPTCHA-producing power of Digg.
Image-recognition as part of a live video feed is an even more remote
"The idea of doing real-time object recognition on a live video stream
is at the fantasy end of the Soylent Grid spectrum," Belongie said in
an e-mail interview. "In reality, we expect it will be more likely to
have an increased role for the computational processing component, so
that the available human cycles are employed more opportunistically.
Ideally, every product identified would go into a standalone database,
eventually enabling quicker lookups for GroZi that wouldn't require
the input of Web users. This ultimately could allow the Soylent Grid
project to be harnessed for other endeavors.
"If the initial GroZi box had some amount of computational power ...
it could be pulled off the grid and run locally on the GroZi box in
the user's hand, or run remotely on a private-GroZi-only computational
system of much smaller scale," said Stephan Steinbach, another Soylent
Grid project member.
Soylent Grid is an example of crowdsourcing, the notion of bringing
together masses of users to accomplish what no individual or company
HumanGrid, launched in December 2005, is another example of applying
crowdsourcing to labor-intensive tasks. The HumanGrid marketplace, in
private beta, aims to introduce businesses and researchers to
individuals willing to perform micro-tasks for micro-payments, such as
data enhancement, text classification, transcription and picture
Amazon's Mechanical Turk service is another example; It's a similar
automated marketplace where businesses can offer to pay humans to do
tasks like tagging objects found in images or selecting the best
photos of a product from a set of images.
The e-tailing giant developed the technology to help sort out the 20
million photos of storefronts to be used in its A9 Yellow Pages local
Belongie said that Soylent Grid has a better chance of succeeding
because its strategy of distributing the work via third-party sites
creates an ecosystem.
"For all three main parties involved -- the researchers, the Web
sites, and the users -- there's something in it for them," he said.
Researchers, whether academic or commercial, "have data they need
labeled, for which one assumes they'd be willing to pay ... for
example, someone wanting to spot pizza storefronts or real estate
posters in Google Street views footage."
"Web site owners want a fresh source of CAPTCHAs, since the ones they
use routinely go stale, meaning they get cracked by hackers in the
Ukraine," he added. "And the users simply want to get to whatever
content lies behind the CAPTCHA."
More information about the Accessibleweb