Arabic reCAPTCHA Service for Enhancing Digitization of Arabic Manuscripts

Hanin Abubaker, Khaled Salah, Hassan Al-Muhairi, Ahmed Bentiba

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


reCAPTCHA is a security measure that guards web applications against automated bot abuse by presenting a random auto-generated challenge for users to solve. These challenges have to be devised to be hard on computers, yet easily solved by humans. In this paper, we present a cloud-based Arabic reCAPTCHA service that provides protection for Arabic websites against automated abuse. In addition, the proposed service is designed to improve the accuracy of printed Arabic manuscripts digitization when compared with the traditional digitization using optical character recognition software. The architectural design, algorithms, implementation and deployment guidelines presented in this paper are not limited to the Arabic language, but can be the basis for developing a reCAPTCHA service for any other language. The paper discusses the need for developing an Arabic reCAPTCHA service and then presents an original system architecture, design and implementation. We also address and propose solutions and algorithms to a number of design and implementation challenges. First, we devise a scheme to properly extract word images from scanned pages to form reCAPTCHA challenges. Second, we propose a classification mechanism for the extracted word images into known and unknown word sets. Third, we explore and propose two algorithms for processing user input to a reCAPTCHA challenge to prepare the service response for user verification, and at the same time, store the user guess for the digitization process. Fourth, we present a solution to maintain data integrity while handling multiple user requests for reCAPTCHA challenges. Moreover, we show how the different components and subservices of our proposed Arabic reCAPTCHA system can be deployed on a public cloud as that of Amazon Web Services. Finally, we conduct an experimental study to validate the efficacy of the service. The study shows that an overall digitization accuracy of 97.67 and 96.73% in two experiment setups was attained and that 72.2% of the audience preferred solving Arabic reCAPTCHA challenges over English reCAPTCHA in Arabic websites.

Original languageBritish English
Pages (from-to)3391-3408
Number of pages18
JournalArabian Journal for Science and Engineering
Issue number8
StatePublished - 1 Aug 2017


  • Arabic reCAPTCHA
  • Cloud-based services
  • Crowdsourcing
  • Digital Arabic content
  • Digitization
  • Optical character recognition


Dive into the research topics of 'Arabic reCAPTCHA Service for Enhancing Digitization of Arabic Manuscripts'. Together they form a unique fingerprint.

Cite this