91 |
Embedded Arabic text detection and recognition in videos / Détection et reconnaissance du texte arabe incrusté dans les vidéosYousfi, Sonia 06 July 2016 (has links)
Cette thèse s'intéresse à la détection et la reconnaissance du texte arabe incrusté dans les vidéos. Dans ce contexte, nous proposons différents prototypes de détection et d'OCR vidéo (Optical Character Recognition) qui sont robustes à la complexité du texte arabe (différentes échelles, tailles, polices, etc.) ainsi qu'aux différents défis liés à l'environnement vidéo et aux conditions d'acquisitions (variabilité du fond, luminosité, contraste, faible résolution, etc.). Nous introduisons différents détecteurs de texte arabe qui se basent sur l'apprentissage artificiel sans aucun prétraitement. Les détecteurs se basent sur des Réseaux de Neurones à Convolution (ConvNet) ainsi que sur des schémas de boosting pour apprendre la sélection des caractéristiques textuelles manuellement conçus. Quant à notre méthodologie d'OCR, elle se passe de la segmentation en traitant chaque image de texte en tant que séquence de caractéristiques grâce à un processus de scanning. Contrairement aux méthodes existantes qui se basent sur des caractéristiques manuellement conçues, nous proposons des représentations pertinentes apprises automatiquement à partir des données. Nous utilisons différents modèles d'apprentissage profond, regroupant des Auto-Encodeurs, des ConvNets et un modèle d'apprentissage non-supervisé, qui génèrent automatiquement ces caractéristiques. Chaque modèle résulte en un système d'OCR bien spécifique. Le processus de reconnaissance se base sur une approche connexionniste récurrente pour l'apprentissage de l'étiquetage des séquences de caractéristiques sans aucune segmentation préalable. Nos modèles d'OCR proposés sont comparés à d'autres modèles qui se basent sur des caractéristiques manuellement conçues. Nous proposons, en outre, d'intégrer des modèles de langage (LM) arabes afin d'améliorer les résultats de reconnaissance. Nous introduisons différents LMs à base des Réseaux de Neurones Récurrents capables d'apprendre des longues interdépendances linguistiques. Nous proposons un schéma de décodage conjoint qui intègre les inférences du LM en parallèle avec celles de l'OCR tout en introduisant un ensemble d’hyper-paramètres afin d'améliorer la reconnaissance et réduire le temps de réponse. Afin de surpasser le manque de corpus textuels arabes issus de contenus multimédia, nous mettons au point de nouveaux corpus manuellement annotés à partir des flux TV arabes. Le corpus conçu pour l'OCR, nommé ALIF et composée de 6,532 images de texte annotées, a été publié a des fins de recherche. Nos systèmes ont été développés et évalués sur ces corpus. L’étude des résultats a permis de valider nos approches et de montrer leurs efficacité et généricité avec plus de 97% en taux de détection, 88.63% en taux de reconnaissance mots sur le corpus ALIF dépassant ainsi un des systèmes d'OCR commerciaux les mieux connus par 36 points. / This thesis focuses on Arabic embedded text detection and recognition in videos. Different approaches robust to Arabic text variability (fonts, scales, sizes, etc.) as well as to environmental and acquisition condition challenges (contrasts, degradation, complex background, etc.) are proposed. We introduce different machine learning-based solutions for robust text detection without relying on any pre-processing. The first method is based on Convolutional Neural Networks (ConvNet) while the others use a specific boosting cascade to select relevant hand-crafted text features. For the text recognition, our methodology is segmentation-free. Text images are transformed into sequences of features using a multi-scale scanning scheme. Standing out from the dominant methodology of hand-crafted features, we propose to learn relevant text representations from data using different deep learning methods, namely Deep Auto-Encoders, ConvNets and unsupervised learning models. Each one leads to a specific OCR (Optical Character Recognition) solution. Sequence labeling is performed without any prior segmentation using a recurrent connectionist learning model. Proposed solutions are compared to other methods based on non-connectionist and hand-crafted features. In addition, we propose to enhance the recognition results using Recurrent Neural Network-based language models that are able to capture long-range linguistic dependencies. Both OCR and language model probabilities are incorporated in a joint decoding scheme where additional hyper-parameters are introduced to boost recognition results and reduce the response time. Given the lack of public multimedia Arabic datasets, we propose novel annotated datasets issued from Arabic videos. The OCR dataset, called ALIF, is publicly available for research purposes. As the best of our knowledge, it is first public dataset dedicated for Arabic video OCR. Our proposed solutions were extensively evaluated. Obtained results highlight the genericity and the efficiency of our approaches, reaching a word recognition rate of 88.63% on the ALIF dataset and outperforming well-known commercial OCR engine by more than 36%.
|
92 |
Handwriting Chinese character recognition based on quantum particle swarm optimization support vector machinePang, Bo January 2018 (has links)
University of Macau / Faculty of Science and Technology. / Department of Computer and Information Science
|
93 |
A Book Reader Design for Persons with Visual Impairment and BlindnessGalarza, Luis E. 16 November 2017 (has links)
The objective of this dissertation is to provide a new design approach to a fully automated book reader for individuals with visual impairment and blindness that is portable and cost effective. This approach relies on the geometry of the design setup and provides the mathematical foundation for integrating, in a unique way, a 3-D space surface map from a low-resolution time of flight (ToF) device with a high-resolution image as means to enhance the reading accuracy of warped images due to the page curvature of bound books and other magazines. The merits of this low cost, but effective automated book reader design include: (1) a seamless registration process of the two imaging modalities so that the low resolution (160 x 120 pixels) height map, acquired by an Argos3D-P100 camera, accurately covers the entire book spread as captured by the high resolution image (3072 x 2304 pixels) of a Canon G6 Camera; (2) a mathematical framework for overcoming the difficulties associated with the curvature of open bound books, a process referred to as the dewarping of the book spread images, and (3) image correction performance comparison between uniform and full height map to determine which map provides the highest Optical Character Recognition (OCR) reading accuracy possible. The design concept could also be applied to address the challenging process of book digitization. This method is dependent on the geometry of the book reader setup for acquiring a 3-D map that yields high reading accuracy once appropriately fused with the high-resolution image. The experiments were performed on a dataset consisting of 200 pages with their corresponding computed and co-registered height maps, which are made available to the research community (cate-book3dmaps.fiu.edu). Improvements to the characters reading accuracy, due to the correction steps, were quantified and measured by introducing the corrected images to an OCR engine and tabulating the number of miss-recognized characters. Furthermore, the resilience of the book reader was tested by introducing a rotational misalignment to the book spreads and comparing the OCR accuracy to those obtained with the standard alignment. The standard alignment yielded an average reading accuracy of 95.55% with the uniform height map (i.e., the height values of the central row of the 3-D map are replicated to approximate all other rows), and 96.11% with the full height maps (i.e., each row has its own height values as obtained from the 3D camera). When the rotational misalignments were taken into account, the results obtained produced average accuracies of 90.63% and 94.75% for the same respective height maps, proving added resilience of the full height map method to potential misalignments.
|
94 |
Hybrid segmentation on slant & skewed deformation text in natural scene images / Hybrid segmentation on slant and skewed deformation text in natural scene imagesFei, Xiao Lei January 2010 (has links)
University of Macau / Faculty of Science and Technology / Department of Computer and Information Science
|
95 |
Gabor filter parameter optimization for multi-textured images : a case study on water body extraction from satellite imagery.Pillay, Maldean. January 2012 (has links)
The analysis and identification of texture is a key area in image processing and computer
vision. One of the most prominent texture analysis algorithms is the Gabor Filter.
These filters are used by convolving an image with a family of self similar filters or
wavelets through the selection of a suitable number of scales and orientations, which
are responsible for aiding in the identification of textures of differing coarseness and
directions respectively.
While extensively used in a variety of applications, including, biometrics such as iris and
facial recognition, their effectiveness depend largely on the manual selection of different
parameters values, i.e. the centre frequency, the number of scales and orientations, and
the standard deviations. Previous studies have been conducted on how to determine
optimal values. However the results are sometimes inconsistent and even contradictory.
Furthermore, the selection of the mask size and tile size used in the convolution process
has received little attention, presumably since they are image set dependent.
This research attempts to verify specific claims made in previous studies about the
influence of the number of scales and orientations, but also to investigate the variation of
the filter mask size and tile size for water body extraction from satellite imagery. Optical
satellite imagery may contain texture samples that are conceptually the same (belong
to the same class), but are structurally different or differ due to changes in illumination,
i.e. a texture may appear completely different when the intensity or position of a light
source changes.
A systematic testing of the effects of varying the parameter values on optical satellite
imagery is conducted. Experiments are designed to verify claims made about the influence of varying the scales and orientations within predetermined ranges, but also to
show the considerable changes in classification accuracy when varying the filter mask
and tile size. Heuristic techniques such as Genetic Algorithms (GA) can be used to find
optimum solutions in application domains where an enumeration approach is not feasible.
Hence, the effectiveness of a GA to automate the process of determining optimum
Gabor filter parameter values for a given image dataset is also investigated.
The results of the research can be used to facilitate the selection of Gabor filter parameters
for applications that involve multi-textured image segmentation or classification,
and specifically to guide the selection of appropriate filter mask and tile sizes for automated
analysis of satellite imagery. / Thesis (M.Sc.)-University of KwaZulu-Natal, Durban, 2012.
|
96 |
Voice input for the disabled /Holmes, William Paul. January 1987 (has links) (PDF)
Thesis (M. Eng. Sc.)--University of Adelaide, 1987. / Typescript. Includes a copy of a paper presented at TADSEM '85 --Australian Seminar on Devices for Expressive Communication and Environmental Control, co-authored by the author. Includes bibliographical references (leaves [115-121]).
|
97 |
A new class of convolutional neural networks based on shunting inhibition with applications to visual pattern recognitionTivive, Fok Hing Chi. January 2006 (has links)
Thesis (Ph.D.)--University of Wollongong, 2006. / Typescript. Includes bibliographical references: leaf 208-226.
|
98 |
API för att tolka och ta fram information från kvittonSanfer, Jonathan January 2018 (has links)
Denna rapport redogör för skapandet av ett API som kan extrahera information från bilder på kvitton. Informationen som APIet skulle kunna ta fram var organisationsnummer, datum, tid, summa och moms. Här ingår även en fördjupning om tekniken OCR (optical character recognition) som omvandlar bilder och dokument till text. Examensarbetet utfördes åt Flex Applications AB. Examensarbetet utfördes åt Flex Applications AB. / This report describes the creation of an API that can extract information from pictures of receipts. Registration number, date, time, sum and tax are the information that the API was going to be able to deliver. In this thesis there is also a deepening of the technology OCR (optical character recognition) that transforms pictures and documents to text. The thesis was performed for Flex Applications AB.
|
99 |
A Possibilistic Approach To Handwritten Script Identification Via Morphological Methods For Pattern RepresentationGhosh, Debashis 04 1900 (has links) (PDF)
No description available.
|
100 |
Detekce objektu ve videosekvencích / Object Detection in Video SequencesŠebela, Miroslav January 2010 (has links)
The thesis consists of three parts. Theoretical description of digital image processing, optical character recognition and design of system for car licence plate recognition (LPR) in image or video sequence. Theoretical part describes image representation, smoothing, methods used for blob segmentation and proposed are two methods for optical character recognition (OCR). Concern of practical part is to find solution and design procedure for LPR system included OCR. The design contain image pre-processing, blob segmentation, object detection based on its properties and OCR. Proposed solution use grayscale trasformation, histogram processing, thresholding, connected component,region recognition based on its patern and properties. Implemented is also optical recognition method of licence plate where acquired values are compared with database used to manage entry of vehicles into object.
|
Page generated in 0.2013 seconds