Spelling suggestions: "subject:"text arecognition"" "subject:"text 2recognition""
11 |
Recognition of off-line printed Arabic text using Hidden Markov Models.Al-Muhtaseb, Husni A., Mahmoud, Sabri A., Qahwaji, Rami S.R. January 2008 (has links)
yes / This paper describes a technique for automatic recognition of off-line printed Arabic text using Hidden Markov Models. In this work different sizes of overlapping and non-overlapping hierarchical windows are used to generate 16 features from each vertical sliding strip. Eight different Arabic fonts were used for testing (viz. Arial, Tahoma, Akhbar, Thuluth, Naskh, Simplified Arabic, Andalus, and Traditional Arabic). It was experimentally proven that different fonts have their highest recognition rates at different numbers of states (5 or 7) and codebook sizes (128 or 256).
Arabic text is cursive, and each character may have up to four different shapes based on its location in a word. This research work considered each shape as a different class, resulting in a total of 126 classes (compared to 28 Arabic letters). The achieved average recognition rates were between 98.08% and 99.89% for the eight experimental fonts.
The main contributions of this work are the novel hierarchical sliding window technique using only 16 features for each sliding window, considering each shape of Arabic characters as a separate class, bypassing the need for segmenting Arabic text, and its applicability to other languages.
|
12 |
Layout Detection and Table Recognition: Recent Challenges in Digitizing Historical Documents and Handwritten Tabular DataLehenmeier, Constantin, Burghardt, Manuel, Mischka, Bernadette 11 June 2024 (has links)
In this paper, we discuss the computer-aided processing of handwritten tabular
records of historical weather data. The observationes meteorologicae, which are housed by the
Regensburg University Library, are one of the oldest collections of weather data in Europe.
Starting in 1771, meteorological data was consistently documented in a standardized form
over almost 60 years by several writers. The tabular structure, as well as the unconstrained
textual layout of comments and the use of historical characters, propose various challenges
in layout and text recognition. We present a customized strategy to digitize tabular and
handwritten data by combining various state-of-the-art methods for OCR processing to fit
the collection. Since the recognition of historical documents still poses major challenges,
we provide lessons learned from experimental testing during the first project stages. Our
results show that deep learning methods can be used for text recognition and layout detection.
However, they are less efficient for the recognition of tabular structures. Furthermore,
a tailored approach had to be developed for the historical meteorological characters during
the manual creation of ground truth data. The customized system achieved an accuracy
rate of 82% for the text recognition of the heterogeneous handwriting and 87% accuracy
for layout recognition of the tables.
|
13 |
Multimodal Interactive Transcription of Handwritten Text ImagesRomero Gómez, Verónica 20 September 2010 (has links)
En esta tesis se presenta un nuevo marco interactivo y multimodal para la transcripción de
Documentos manuscritos. Esta aproximación, lejos de proporcionar la transcripción completa
pretende asistir al experto en la dura tarea de transcribir.
Hasta la fecha, los sistemas de reconocimiento de texto manuscrito disponibles no proporcionan
transcripciones aceptables por los usuarios y, generalmente, se requiere la intervención
del humano para corregir las transcripciones obtenidas. Estos sistemas han demostrado ser
realmente útiles en aplicaciones restringidas y con vocabularios limitados (como es el caso
del reconocimiento de direcciones postales o de cantidades numéricas en cheques bancarios),
consiguiendo en este tipo de tareas resultados aceptables. Sin embargo, cuando se trabaja
con documentos manuscritos sin ningún tipo de restricción (como documentos manuscritos
antiguos o texto espontáneo), la tecnología actual solo consigue resultados inaceptables.
El escenario interactivo estudiado en esta tesis permite una solución más efectiva. En este
escenario, el sistema de reconocimiento y el usuario cooperan para generar la transcripción final
de la imagen de texto. El sistema utiliza la imagen de texto y una parte de la transcripción
previamente validada (prefijo) para proponer una posible continuación. Despues, el usuario
encuentra y corrige el siguente error producido por el sistema, generando así un nuevo prefijo
mas largo. Este nuevo prefijo, es utilizado por el sistema para sugerir una nueva hipótesis. La
tecnología utilizada se basa en modelos ocultos de Markov y n-gramas. Estos modelos son
utilizados aquí de la misma manera que en el reconocimiento automático del habla. Algunas
modificaciones en la definición convencional de los n-gramas han sido necesarias para tener
en cuenta la retroalimentación del usuario en este sistema. / Romero Gómez, V. (2010). Multimodal Interactive Transcription of Handwritten Text Images [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8541
|
14 |
Deep learning for text spottingJaderberg, Maxwell January 2015 (has links)
This thesis addresses the problem of text spotting - being able to automatically detect and recognise text in natural images. Developing text spotting systems, systems capable of reading and therefore better interpreting the visual world, is a challenging but wildly useful task to solve. We approach this problem by drawing on the successful developments in machine learning, in particular deep learning and neural networks, to present advancements using these data-driven methods. Deep learning based models, consisting of millions of trainable parameters, require a lot of data to train effectively. To meet the requirements of these data hungry algorithms, we present two methods of automatically generating extra training data without any additional human interaction. The first crawls a photo sharing website and uses a weakly-supervised existing text spotting system to harvest new data. The second is a synthetic data generation engine, capable of generating unlimited amounts of realistic looking text images, that can be solely relied upon for training text recognition models. While we define these new datasets, all our methods are also evaluated on standard public benchmark datasets. We develop two approaches to text spotting: character-centric and word-centric. In the character-centric approach, multiple character classifier models are developed, reinforcing each other through a feature sharing framework. These character models are used to generate text saliency maps to drive detection, and convolved with detection regions to enable text recognition, producing an end-to-end system with state-of-the-art performance. For the second, higher-level, word-centric approach to text spotting, weak detection models are constructed to find potential instances of words in images, which are subsequently refined and adjusted with a classifier and deep coordinate regressor. A whole word image recognition model recognises words from a huge dictionary of 90k words using classification, resulting in previously unattainable levels of accuracy. The resulting end-to-end text spotting pipeline advances the state of the art significantly and is applied to large scale video search. While dictionary based text recognition is useful and powerful, the need for unconstrained text recognition still prevails. We develop a two-part model for text recognition, with the complementary parts combined in a graphical model and trained using a structured output learning framework adapted to deep learning. The trained recognition model is capable of accurately recognising unseen and completely random text. Finally, we make a general contribution to improve the efficiency of convolutional neural networks. Our low-rank approximation schemes can be utilised to greatly reduce the number of computations required for inference. These are applied to various existing models, resulting in real-world speedups with negligible loss in predictive power.
|
15 |
Video content analysis for intelligent forensicsFraz, Muhammad January 2014 (has links)
The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild.
|
16 |
Détection, localisation et typage de texte dans des images de documents hétérogènes par Réseaux de Neurones Profonds / Detection, localization and typing of text in heterogeneous document images with Deep Neural NetworksMoysset, Bastien 28 May 2018 (has links)
Lire automatiquement le texte présent dans les documents permet de rendre accessible les informations qu'ils contiennent. Pour réaliser la transcription de pages complètes, la localisation des lignes de texte est une étape cruciale. Les méthodes traditionnelles de détection de lignes, basées sur des approches de traitement d'images, peinent à généraliser à des jeux de données hétérogènes. Pour cela, nous proposons dans cette thèse une approche par réseaux de neurones profonds. Nous avons d'abord proposé une approche de segmentation mono-dimensionnelle des paragraphes de texte en lignes à l'aide d'une technique inspirée des modèles de reconnaissance, où une classification temporelle connexionniste (CTC) est utilisée pour aligner implicitement les séquences. Ensuite, nous proposons un réseau qui prédit directement les coordonnées des boîtes englobant les lignes de texte. L'ajout d'un terme de confiance à ces boîtes hypothèses permet de localiser un nombre variable d'objets. Nous proposons une prédiction locale des objets afin de partager les paramètres entre les localisations et, ainsi, de multiplier les exemples d'objets vus par chaque prédicteur de boîte lors de l'entraînement. Cela permet de compenser la taille restreinte des jeux de données utilisés. Pour récupérer les informations contextuelles permettant de prendre en compte la structure du document, nous ajoutons, entre les couches convolutionnelles, des couches récurrentes LSTM multi-dimensionnelles. Nous proposons trois stratégies de reconnaissance pleine page qui permettent de tenir compte du besoin important de précision au niveau des positions et nous montrons, sur la base hétérogène Maurdor, la performance de notre approche pour des documents multilingues pouvant être manuscrits et imprimés. Nous nous comparons favorablement à des méthodes issues de l'état de l'art. La visualisation des concepts appris par nos neurones permet de souligner la capacité des couches récurrentes à apporter l'information contextuelle. / Being able to automatically read the texts written in documents, both printed and handwritten, makes it possible to access the information they convey. In order to realize full page text transcription, the detection and localization of the text lines is a crucial step. Traditional methods tend to use image processing based approaches, but they hardly generalize to very heterogeneous datasets. In this thesis, we propose to use a deep neural network based approach. We first propose a mono-dimensional segmentation of text paragraphs into lines that uses a technique inspired by the text recognition models. The connexionist temporal classification (CTC) method is used to implicitly align the sequences. Then, we propose a neural network that directly predicts the coordinates of the boxes bounding the text lines. Adding a confidence prediction to these hypothesis boxes enables to locate a varying number of objects. We propose to predict the objects locally in order to share the network parameters between the locations and to increase the number of different objects that each single box predictor sees during training. This compensates the rather small size of the available datasets. In order to recover the contextual information that carries knowledge on the document layout, we add multi-dimensional LSTM recurrent layers between the convolutional layers of our networks. We propose three full page text recognition strategies that tackle the need of high preciseness of the text line position predictions. We show on the heterogeneous Maurdor dataset how our methods perform on documents that can be printed or handwritten, in French, English or Arabic and we favourably compare to other state of the art methods. Visualizing the concepts learned by our neurons enables to underline the ability of the recurrent layers to convey the contextual information.
|
17 |
Modul do serverové aplikace pro rozpoznávání identifikačních údajů z osobních dokladůBARTYZAL, Miroslav January 2018 (has links)
This Master's thesis deals with the creation of a server-side system used for the automated reading of personal information from photographed identity documents. It is focused on the processing of photographs made by camera phones with respect to various quality of their images. Text localization in images and its recognition by means of neural network are the subject of this thesis. The final system is tested by the client application which was created for the Android operating system.
|
18 |
Využití neanotovaných dat pro trénování OCR / OCR Trained with Unanotated DataBuchal, Petr January 2021 (has links)
The creation of a high-quality optical character recognition system (OCR) requires a large amount of labeled data. Obtaining, or in other words creating, such a quantity of labeled data is a costly process. This thesis focuses on several methods which efficiently use unlabeled data for the training of an OCR neural network. The proposed methods fall into the category of self-training algorithms. The general approach of all proposed methods can be summarized as follows. Firstly, the seed model is trained on a limited amount of labeled data. Then, the seed model in combination with the language model is used for producing pseudo-labels for unlabeled data. Machine-labeled data are then combined with the training data used for the creation of the seed model and they are used again for the creation of the target model. The successfulness of individual methods is measured on the handwritten ICFHR 2014 Bentham dataset. Experiments were conducted on two datasets which represented different degrees of labeled data availability. The best model trained on the smaller dataset achieved 3.70 CER [%], which is a relative improvement of 42 % in comparison with the seed model, and the best model trained on the bigger dataset achieved 1.90 CER [%], which is a relative improvement of 26 % in comparison with the seed model. This thesis shows that the proposed methods can be efficiently used to improve the OCR error rate by means of unlabeled data.
|
19 |
Adaptivní rozpoznávání ručně psaného textu / Adaptive Handwritten Text RecognitionProcházka, Štěpán January 2021 (has links)
The need to preserve and exchange written information is central to the human society, with handwriting satisfying such need for several past millenia. Unlike optical character recognition of typeset fonts, which has been throughly studied in the last few decades, the task of handwritten text recognition, being considerably harder, lacks such attention. In this work, we study the capabilities of deep convolutional and recurrent neural networks to solve handwritten text extraction. To mitigate the need for large quantity of real ground truth data, we propose a suitable synthetic data generator for model pre-training, and carry out extensive set of experiments to devise a self-training strategy to adapt the model to unnanotated real handwritten letterings. The proposed approach is compared to supervised approaches and state-of-the-art results on both established and novel datasets, achieving satisfactory performance. 1
|
20 |
Vad Innebär Det Att Skriva I Skolan? : Diktera – en digital möjlighet i en lärmiljö för allaToresson, Anna-Karin January 2021 (has links)
This is a study of quantitative and qualitative methods that aims to gain increased knowledge about primary school students and what it means to write. The study examines if dictation provides a digital opportunity in a learning environment for everyone. The study is a case study. The study has a mixed-methods design with an explanatory Sequential Design. The study is based on empirical methods that consists of two quantitative and two qualitative methods. The quantitative methods are measurement of LIX value of student texts and the students' grades. The qualitative methods are a questionnaire to seven students in eighth grade and a semi-structured interview with a teacher. The study's theoretical framework rests on a socio-cultural perspective, with Vygotsky's theories about language and communication and Säljö´s thoughts about artefacts and dictation as a writing tool. The study uses a hermeneutic perspective to describe the qualitative parts of the study. This perspective is used to describe an interaction between theory and method analysis that provides an opportunity for a deeper understanding. The results of the study show that students think that dictation is a functional writing tool. The results from the questionnaire show that students think it is important to plan their writing before dictation. Furthermore, students discover that they must adapt their voice to the dictation program. By learning the software, the students´ develop their writing ability. Finally, students note that the processing is different and requires different strategies for correcting than traditional writing does. Perhaps the biggest obstacle in itself is that the transcriber needs to have access to a quiet place. The knowledge contribution that is added to the problem area and previous research is a deeper understanding of the factors that affect students' writing through dictation. The study is important and relevant to the teaching profession and contributes to the fact that dictation can be a way of writing for students. The experiences from this study can be a support for teachers in developing their schools´ learning environment. Coupled with teachers' broad repertoire in writing and writing development, this will give more students the opportunity to reach approved knowledge requirements in Swedish compulsory school as Nilholm assert. / <p>Digital presentation</p>
|
Page generated in 0.0668 seconds