Have you heard of the “CAPTCHA” tool? Probably not, but I’m sure you’ve seen it and even used it. It’s used by secure websites to prevent automated registrations. It can verify that you are a human who is submitting information to their website and not some sort of “bot.” I know you’ve seen it: the box that you have to retype the distorted words in to prove you are human. Like this:
CAPTCHA stands for “Completely Automated Public Turing Test to Tell Computers and Humans Apart.” It works because humans can read distorted text and current computers can’t. It was developed by four men at Carnegie Mellon University in 2000 for Yahoo. In fact, there’s a fantastic article available online for free that was written by three of the four creators called, “Telling Humans and Computers Apart: How lazy cryptographers do AI.” It is available here for free: http://www.cs.cmu.edu/~biglou/captcha_cacm.pdf. The authors have a sense of humor too, which I loved. In their article they said—while explaining that it is a computer that is used to determine if the registrant is human or another computer, “Notice the paradox: a CAPTCHA is a program that can generate and grade tests that it itself cannot pass (much like some professors)” (Ahn, Blum & Langford).
There are several practical uses for the tool including preventing comment spam in blogs; verifying online poll respondents; preventing dictionary attacks; and thwarting spam and worms by ensuring that the person sending you an email is a real person.
If your website needs protection, you too can get the Captcha tool on your website for free from the reCAPTCHA project here: http://www.google.com/recaptcha.
There’s also a little known real-world application from the reCAPTCHA project: to help digitize text. According to reCAPTCHA, the tool is used to “Stop spam and help digitize books at the same time! The words shown come directly from old books that are being digitized.” This is done through a “sophisticated combination of multiple OCR programs.” It has allowed programmers to “achieve 99.5% transcription accuracy” from the millions of answers people have put in the challenges. At the link I just provided, you can see a comparison of how the two different texts are translated (OCR vs. reCAPTCHA). It’s pretty incredible. I’ve run across digitized text when I’ve been working on genealogy and can tell you that there is a lot to be desired regarding the translation.
There have been historical books translated online via a PDF and you can readily see the problems with the text. Some of it comes out as characters and/or symbols instead of words making reading somewhat difficult.
Who knew that by using a useful tool like CAPTCHA, we would be helping to digitize old documents.
“CAPTCHA: Telling Humans and Computers Apart Automatically.” CAPTCHA.net. 2012. Web. 01 Nov. 2012.
“reCAPTCHA: Digitizing Books One Word at a Time.” Google.com/recaptcha. 2012. Web. 01 Nov. 2012.
Von Ahn, Luis, Manuel Blum and John Langford. “Telling Humans and Computers Apart.” Communications of the ACH. February 2004: Vol. 47, No. 3. Web. 01 Nov. 2012.