ARABIC HANDWRITING RECOGNITION USING LOCAL APPROACH

Benhamadi, Abdelmadjid

ARABIC HANDWRITING RECOGNITION USING LOCAL APPROACH

Benhamadi, Abdelmadjid

المكان (URI): https://dspace.univ-ghardaia.edu.dz/xmlui/handle/123456789/4952

التاريخ: 2017

الخلاصة:

One of the goals of computer research is to extend the boundaries of what is automatable. Repetitive, tedious tasks involving large volumes of data are good candidates. These include the processing of bank checks, the sorting of postal mail, the indexing of national archives (military archives, census forms, library funds, etc.), the indexing of private archives, the processing of incoming business mails, and so on. Although a machine is capable of performing complex calculations, in which it often exceeds human capabilities, it is nonetheless limited. We must still make use of a keyboard in order to communicate with it, a task that is painstaking for some, or at least unnatural. For the time being, and although research in this field has continued for more than thirty years, the general solution to the problem of the automatic reading of cursive writing remains unknown. It seems, however, that cursive handwriting recognition has an important role to play in future recognition systems and hence that this field of research is still very much relevant. The automation of this task requires the machine to be able to read the handwriting. However, the types of writing styles can vary considerably, depending on the writer. While reading is a relatively trivial act for a human, this activity nevertheless involves complex processes. Properly recognizing isolated symbols is not enough. Studying how a human successfully performs this complex task could prove useful in teaching machines how to read handwritten texts. Which primitives are detected during reading? How do we access the information that allows us to understand the meaning of a word? Is the perception of a word constructed from the perception of its letters or from the perception of its general form? For several years, researchers in the fields of biology, neurophysiology, cognitive psychology and linguistics have studied these questions, and reading models have resulted from these investigations. Although these models are still to be improved and although several of the theories put forward different points of view, we believe we can take advantage of their observations to develop a robust system of image recognition of isolated cursive words. In fact, the primitives detected during reading can help us make an informed choice of primitives to be sought as a priority during the recognition process. The problem of lexical access can also influence the choice of architecture chosen for the method developed. However, most of these reading models have been developed from printed texts. Few studies have been done on the mechanisms involved in the reading of cursive writing. Their authors conclude that even if the reading of cursive words differs at first sight from the reading of printed words, once what they call the “normalization of the cursive” is completed, printed and manuscript words seem to be subject to similar processing. Recognition is called “on-line” when dynamic data is acquired during writing. We may think here of a tablet or an electronic paper where the user writes with a pen. On the other hand, it is called “off-line” when it comes to recognizing the image of a word obtained with a scanner. The objective of this essay is to propose a system of recognition of off-line Arabic handwriting. This system is based on a structural segmentation method and uses Support Vector Machines (SVM) in the classification phase. The first chapter is a reminder of some general concepts of Optical Character Recognition (OCR). As well as the necessary steps for the realization of a system of recognition, and a study of the OCR and the Arabic language, where we find a reminder on certain aspects of Arabic calligraphy, followed by notions of OCR on Arabic writing. The second chapter is specific to the state of the art of segmentation of texts in the general case. For this purpose, we describe the process involved in the detection of objects in a page, the segmentation of the text blocks in rows then in word and then in characters. We focus on the methods used in this type of segmentation. The third chapter will focus on the classification method, by studying the Support Vector Machines (SVM) method. The fourth chapter constitutes our contribution. It is an algorithm allowing the segmentation of Arabic handwritten texts, followed by the tests and results obtained. We complete the work with a conclusion on the results obtained with our method, and finally the prospects of this work.

عرض سجل المادة الكامل