• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 26
  • 7
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 52
  • 20
  • 19
  • 15
  • 8
  • 7
  • 7
  • 6
  • 6
  • 6
  • 6
  • 6
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Exploring sparsity, self-similarity, and low rank approximation in action recognition, motion retrieval, and action spotting

Sun, Chuan 01 January 2014 (has links)
This thesis consists of 4 major parts. In the first part (Chapters 1-2), we introduce the overview, motivation, and contribution of our works, and extensively survey the current literature for 6 related topics. In the second part (Chapters 3-7), we explore the concept of "Self-Similarity" in two challenging scenarios, namely, the Action Recognition and the Motion Retrieval. We build three-dimensional volume representations for both scenarios, and devise effective techniques that can produce compact representations encoding the internal dynamics of data. In the third part (Chapter 8), we explore the challenging action spotting problem, and propose a feature-independent unsupervised framework that is effective in spotting action under various real situations, even under heavily perturbed conditions. The final part (Chapters 9) is dedicated to conclusions and future works. For action recognition, we introduce a generic method that does not depend on one particular type of input feature vector. We make three main contributions: (i) We introduce the concept of Joint Self-Similarity Volume (Joint SSV) for modeling dynamical systems, and show that by using a new optimized rank-1 tensor approximation of Joint SSV one can obtain compact low-dimensional descriptors that very accurately preserve the dynamics of the original system, e.g. an action video sequence; (ii) The descriptor vectors derived from the optimized rank-1 approximation make it possible to recognize actions without explicitly aligning the action sequences of varying speed of execution or difference frame rates; (iii) The method is generic and can be applied using different low-level features such as silhouettes, histogram of oriented gradients (HOG), etc. Hence, it does not necessarily require explicit tracking of features in the space-time volume. Our experimental results on five public datasets demonstrate that our method produces very good results and outperforms many baseline methods. For action recognition for incomplete videos, we determine whether incomplete videos that are often discarded carry useful information for action recognition, and if so, how one can represent such mixed collection of video data (complete versus incomplete, and labeled versus unlabeled) in a unified manner. We propose a novel framework to handle incomplete videos in action classification, and make three main contributions: (i) We cast the action classification problem for a mixture of complete and incomplete data as a semi-supervised learning problem of labeled and unlabeled data. (ii) We introduce a two-step approach to convert the input mixed data into a uniform compact representation. (iii) Exhaustively scrutinizing 280 configurations, we experimentally show on our two created benchmarks that, even when the videos are extremely sparse and incomplete, it is still possible to recover useful information from them, and classify unknown actions by a graph based semi-supervised learning framework. For motion retrieval, we present a framework that allows for a flexible and an efficient retrieval of motion capture data in huge databases. The method first converts an action sequence into a self-similarity matrix (SSM), which is based on the notion of self-similarity. This conversion of the motion sequences into compact and low-rank subspace representations greatly reduces the spatiotemporal dimensionality of the sequences. The SSMs are then used to construct order-3 tensors, and we propose a low-rank decomposition scheme that allows for converting the motion sequence volumes into compact lower dimensional representations, without losing the nonlinear dynamics of the motion manifold. Thus, unlike existing linear dimensionality reduction methods that distort the motion manifold and lose very critical and discriminative components, the proposed method performs well, even when inter-class differences are small or intra-class differences are large. In addition, the method allows for an efficient retrieval and does not require the time-alignment of the motion sequences. We evaluate the performance of our retrieval framework on the CMU mocap dataset under two experimental settings, both demonstrating very good retrieval rates. For action spotting, our framework does not depend on any specific feature (e.g. HOG/HOF, STIP, silhouette, bag-of-words, etc.), and requires no human localization, segmentation, or framewise tracking. This is achieved by treating the problem holistically as that of extracting the internal dynamics of video cuboids by modeling them in their natural form as multilinear tensors. To extract their internal dynamics, we devised a novel Two-Phase Decomposition (TP-Decomp) of a tensor that generates very compact and discriminative representations that are robust to even heavily perturbed data. Technically, a Rank-based Tensor Core Pyramid (Rank-TCP) descriptor is generated by combining multiple tensor cores under multiple ranks, allowing to represent video cuboids in a hierarchical tensor pyramid. The problem then reduces to a template matching problem, which is solved efficiently by using two boosting strategies: (i) to reduce the search space, we filter the dense trajectory cloud extracted from the target video; (ii) to boost the matching speed, we perform matching in an iterative coarse-to-fine manner. Experiments on 5 benchmarks show that our method outperforms current state-of-the-art under various challenging conditions. We also created a challenging dataset called Heavily Perturbed Video Arrays (HPVA) to validate the robustness of our framework under heavily perturbed situations.
22

BELONG : A conceptual study of the future office environment

Hellström, Linnea January 2022 (has links)
Today, there are several arguments about the future of the workplace and how each firm should tackle these questions. The development has made significant progress in the workplace, but there is still room for improvement. This industrial design project aims to address issues concerning the future workplace,with the outcome being furniture. The project will be carried out using design-related processes and methods, where future forecasting and trend analyses will play a major partin the project’s conclusion. The project is in collaboration with EFG, the leading European provider of interior design solutions for commercial businesses and public events. Their focus is on sustainable, Scandinavian-inspired items. Products that have undergone rigorous testingto ensure quality. Furniture that encourages wiser methods of working, meeting, andsocializing. Every business day
23

Simplified three-dimensional finite element hot-spotting modelling of a pin-mounted vented brake disc: an investigation of hot-spotting determinants

Tang, Jinghan, Bryant, David, Qi, Hong Sheng, Whiteside, Benjamin R., Babenko, Maksims 29 June 2017 (has links)
Yes / Hot spotting is a thermal localisation phenomenon in which multiple hot regions form on a brake disc surface during high energy and/or high speed braking events. As an undesired problem, hot spots can result in high order brake judder, audible drone and thermal cracking. This paper presents a finite element model for hot spot modelling which introduces the classical axisymmetric assumptions to the brake pad in 3D by scaling the material properties combined with a subroutine to simulate the heat generation instead of modelling the rotation of the brake pad. The results from the initial feasibility models showed significant improvement in computing efficiency with acceptable accuracy when compared to a traditional FE model without such simplifications. This method was then applied to the 3D simulation of hot spotting on a realistic ventilated brake disc/pad pair and the results showed good correlation with experiments. In order to improve the understanding of the hot spotting mechanism, parametric studies were performed including the effects of solid and ventilated disc geometry, rotational speed and energy, pins, disc run-out, and brake pad length. Based on the analysis of the results, it was identified that the vents and pins affected the hot spot distribution. Speed was shown to be more important on the hot spot generation time and distribution than either the pressure or total energy input. Brake disc run-out was shown to affect the magnitude of both hot spot temperature and height due to the non-linear relationship between local deformation, contact pressure and heat generation. Finally, increasing the brake pad length generated fewer hot spots but the temperature of each hot spot increased.
24

Design of Keyword Spotting System Based on Segmental Time Warping of Quantized Features

Karmacharya, Piush January 2012 (has links)
Keyword Spotting in general means identifying a keyword in a verbal or written document. In this research a novel approach in designing a simple spoken Keyword Spotting/Recognition system based on Template Matching is proposed, which is different from the Hidden Markov Model based systems that are most widely used today. The system can be used equally efficiently on any language as it does not rely on an underlying language model or grammatical constraints. The proposed method for keyword spotting is based on a modified version of classical Dynamic Time Warping which has been a primary method for measuring the similarity between two sequences varying in time. For processing, a speech signal is divided into small stationary frames. Each frame is represented in terms of a quantized feature vector. Both the keyword and the  speech  utterance  are  represented  in  terms  of  1‐dimensional  codebook  indices.  The  utterance is divided into segments and the warped distance is computed for each segment and compared against the test keyword. A distortion score for each segment is computed as likelihood measure of the keyword. The proposed algorithm is designed to take advantage of multiple instances of test keyword (if available) by merging the score for all keywords used.   The training method for the proposed system is completely unsupervised, i.e., it requires neither a language model nor phoneme model for keyword spotting. Prior unsupervised training algorithms were based on computing Gaussian Posteriorgrams making the training process complex but the proposed algorithm requires minimal training data and the system can also be trained to perform on a different environment (language, noise level, recording medium etc.) by  re‐training the original cluster on additional data.  Techniques for designing a model keyword from multiple instances of the test keyword are discussed. System performance over variations of different parameters like number of clusters, number of instance of keyword available, etc were studied in order to optimize the speed and accuracy of the system. The system performance was evaluated for fourteen different keywords from the Call - Home and the Switchboard speech corpus. Results varied for different keywords and a maximum accuracy of 90% was obtained which is comparable to other methods using the same time warping algorithms on Gaussian Posteriorgrams. Results are compared for different parameters variation with suggestion of possible improvements. / Electrical and Computer Engineering
25

A 3D Finite Element Simulation of Ventilated Brake Disc Hot Spotting

Tang, Jinghan, Bryant, David, Qi, Hong Sheng 15 June 2016 (has links)
No / Hot spots are high temperature thermal gradients and localisations that are circumferentially distributed on a disc surface which can occur during heavy duty braking. Vibrations and noise can be triggered by hot spotting as well as damage to the disc surface. The experimental investigations suggest that the trigger condition and distribution of hot spots are related to the disc geometry, especially for ventilated discs. To investigate the effects of geometry and structure of a ventilated disc on hot spotting, a 3D finite element model was established. A fast simulation method of hot spotting in 3D was implemented in the model to enable a parametric analysis to be performed more efficiently. The results were validated using experimental data from a laboratory dynamometer.
26

Information spotting in huge repositories of scanned document images / Localisation d'information dans des très grands corpus de documents numérisés

Dang, Quoc Bao 06 April 2018 (has links)
Ce travail vise à développer un cadre générique qui est capable de produire des applications de localisation d'informations à partir d’une caméra (webcam, smartphone) dans des très grands dépôts d'images de documents numérisés et hétérogènes via des descripteurs locaux. Ainsi, dans cette thèse, nous proposons d'abord un ensemble de descripteurs qui puissent être appliqués sur des contenus aux caractéristiques génériques (composés de textes et d’images) dédié aux systèmes de recherche et de localisation d'images de documents. Nos descripteurs proposés comprennent SRIF, PSRIF, DELTRIF et SSKSRIF qui sont construits à partir de l’organisation spatiale des points d’intérêts les plus proches autour d'un point-clé pivot. Tous ces points sont extraits à partir des centres de gravité des composantes connexes de l‘image. A partir de ces points d’intérêts, des caractéristiques géométriques invariantes aux dégradations sont considérées pour construire nos descripteurs. SRIF et PSRIF sont calculés à partir d'un ensemble local des m points d’intérêts les plus proches autour d'un point d’intérêt pivot. Quant aux descripteurs DELTRIF et SSKSRIF, cette organisation spatiale est calculée via une triangulation de Delaunay formée à partir d'un ensemble de points d’intérêts extraits dans les images. Cette seconde version des descripteurs permet d’obtenir une description de forme locale sans paramètres. En outre, nous avons également étendu notre travail afin de le rendre compatible avec les descripteurs classiques de la littérature qui reposent sur l’utilisation de points d’intérêts dédiés de sorte qu'ils puissent traiter la recherche et la localisation d'images de documents à contenu hétérogène. La seconde contribution de cette thèse porte sur un système d'indexation de très grands volumes de données à partir d’un descripteur volumineux. Ces deux contraintes viennent peser lourd sur la mémoire du système d’indexation. En outre, la très grande dimensionnalité des descripteurs peut amener à une réduction de la précision de l'indexation, réduction liée au problème de dimensionnalité. Nous proposons donc trois techniques d'indexation robustes, qui peuvent toutes être employées sans avoir besoin de stocker les descripteurs locaux dans la mémoire du système. Cela permet, in fine, d’économiser la mémoire et d’accélérer le temps de recherche de l’information, tout en s’abstrayant d’une validation de type distance. Pour cela, nous avons proposé trois méthodes s’appuyant sur des arbres de décisions : « randomized clustering tree indexing” qui hérite des propriétés des kd-tree, « kmean-tree » et les « random forest » afin de sélectionner de manière aléatoire les K dimensions qui permettent de combiner la plus grande variance expliquée pour chaque nœud de l’arbre. Nous avons également proposé une fonction de hachage étendue pour l'indexation de contenus hétérogènes provenant de plusieurs couches de l'image. Comme troisième contribution de cette thèse, nous avons proposé une méthode simple et robuste pour calculer l'orientation des régions obtenues par le détecteur MSER, afin que celui-ci puisse être combiné avec des descripteurs dédiés. Comme la plupart de ces descripteurs visent à capturer des informations de voisinage autour d’une région donnée, nous avons proposé un moyen d'étendre les régions MSER en augmentant le rayon de chaque région. Cette stratégie peut également être appliquée à d'autres régions détectées afin de rendre les descripteurs plus distinctifs. Enfin, afin d'évaluer les performances de nos contributions, et en nous fondant sur l'absence d'ensemble de données publiquement disponibles pour la localisation d’information hétérogène dans des images capturées par une caméra, nous avons construit trois jeux de données qui sont disponibles pour la communauté scientifique. / This work aims at developing a generic framework which is able to produce camera-based applications of information spotting in huge repositories of heterogeneous content document images via local descriptors. The targeted systems may take as input a portion of an image acquired as a query and the system is capable of returning focused portion of database image that match the query best. We firstly propose a set of generic feature descriptors for camera-based document images retrieval and spotting systems. Our proposed descriptors comprise SRIF, PSRIF, DELTRIF and SSKSRIF that are built from spatial space information of nearest keypoints around a keypoints which are extracted from centroids of connected components. From these keypoints, the invariant geometrical features are considered to be taken into account for the descriptor. SRIF and PSRIF are computed from a local set of m nearest keypoints around a keypoint. While DELTRIF and SSKSRIF can fix the way to combine local shape description without using parameter via Delaunay triangulation formed from a set of keypoints extracted from a document image. Furthermore, we propose a framework to compute the descriptors based on spatial space of dedicated keypoints e.g SURF or SIFT or ORB so that they can deal with heterogeneous-content camera-based document image retrieval and spotting. In practice, a large-scale indexing system with an enormous of descriptors put the burdens for memory when they are stored. In addition, high dimension of descriptors can make the accuracy of indexing reduce. We propose three robust indexing frameworks that can be employed without storing local descriptors in the memory for saving memory and speeding up retrieval time by discarding distance validating. The randomized clustering tree indexing inherits kd-tree, kmean-tree and random forest from the way to select K dimensions randomly combined with the highest variance dimension from each node of the tree. We also proposed the weighted Euclidean distance between two data points that is computed and oriented the highest variance dimension. The secondly proposed hashing relies on an indexing system that employs one simple hash table for indexing and retrieving without storing database descriptors. Besides, we propose an extended hashing based method for indexing multi-kinds of features coming from multi-layer of the image. Along with proposed descriptors as well indexing frameworks, we proposed a simple robust way to compute shape orientation of MSER regions so that they can combine with dedicated descriptors (e.g SIFT, SURF, ORB and etc.) rotation invariantly. In the case that descriptors are able to capture neighborhood information around MSER regions, we propose a way to extend MSER regions by increasing the radius of each region. This strategy can be also applied for other detected regions in order to make descriptors be more distinctive. Moreover, we employed the extended hashing based method for indexing multi-kinds of features from multi-layer of images. This system are not only applied for uniform feature type but also multiple feature types from multi-layers separated. Finally, in order to assess the performances of our contributions, and based on the assessment that no public dataset exists for camera-based document image retrieval and spotting systems, we built a new dataset which has been made freely and publicly available for the scientific community. This dataset contains portions of document images acquired via a camera as a query. It is composed of three kinds of information: textual content, graphical content and heterogeneous content.
27

Deep learning for text spotting

Jaderberg, Maxwell January 2015 (has links)
This thesis addresses the problem of text spotting - being able to automatically detect and recognise text in natural images. Developing text spotting systems, systems capable of reading and therefore better interpreting the visual world, is a challenging but wildly useful task to solve. We approach this problem by drawing on the successful developments in machine learning, in particular deep learning and neural networks, to present advancements using these data-driven methods. Deep learning based models, consisting of millions of trainable parameters, require a lot of data to train effectively. To meet the requirements of these data hungry algorithms, we present two methods of automatically generating extra training data without any additional human interaction. The first crawls a photo sharing website and uses a weakly-supervised existing text spotting system to harvest new data. The second is a synthetic data generation engine, capable of generating unlimited amounts of realistic looking text images, that can be solely relied upon for training text recognition models. While we define these new datasets, all our methods are also evaluated on standard public benchmark datasets. We develop two approaches to text spotting: character-centric and word-centric. In the character-centric approach, multiple character classifier models are developed, reinforcing each other through a feature sharing framework. These character models are used to generate text saliency maps to drive detection, and convolved with detection regions to enable text recognition, producing an end-to-end system with state-of-the-art performance. For the second, higher-level, word-centric approach to text spotting, weak detection models are constructed to find potential instances of words in images, which are subsequently refined and adjusted with a classifier and deep coordinate regressor. A whole word image recognition model recognises words from a huge dictionary of 90k words using classification, resulting in previously unattainable levels of accuracy. The resulting end-to-end text spotting pipeline advances the state of the art significantly and is applied to large scale video search. While dictionary based text recognition is useful and powerful, the need for unconstrained text recognition still prevails. We develop a two-part model for text recognition, with the complementary parts combined in a graphical model and trained using a structured output learning framework adapted to deep learning. The trained recognition model is capable of accurately recognising unseen and completely random text. Finally, we make a general contribution to improve the efficiency of convolutional neural networks. Our low-rank approximation schemes can be utilised to greatly reduce the number of computations required for inference. These are applied to various existing models, resulting in real-world speedups with negligible loss in predictive power.
28

Reconnaissance et localisation de symboles dans les documents graphiques : approches basées sur le treillis de concepts / Graphics Recognition and Spotting in Graphical Documents : Approaches Based On the Galois Lattice Structure

Boumaiza, Ameni 20 May 2013 (has links)
Omniprésents, la relation homme-machine est encore une définition ardue à cerner. Les ordinateurs réalisent dans le temps des tâches récurrentes. Ils aident ainsi l'homme à manipuler d'énormes quantités de données, souvent même plus rapidement et plus précisément que lui. Malgré cela, la capacité des ordinateurs demeure limitée lorsqu'il s'agit d'extraire automatiquement des informations d'images ou de vidéos, qui représentent pourtant des volumes de données extrêmement importants. La vision par ordinateur est un domaine qui inclut des méthodes d'acquisition, de traitement, d'analyse et de compréhension des images afin de produire de l'information numérique ou symbolique. Un axe de recherche contribuant au développement de ce domaine consiste à reproduire les capacités de la vision humaine par voie électronique afin de percevoir et de comprendre une image. Il s'agit de développer des algorithmes qui reproduisent une des capacités les plus étonnantes du cerveau humain à savoir la déduction des propriétés du monde purement externe au moyen de la lumière qui nous revient des divers objets qui nous entourent. Nos travaux de thèse s'inscrivent dans cet axe de recherche. Nous proposons plusieurs contributions originales s'inscrivant dans le cadre de résolution des problèmes de la reconnaissance et de la localisation des symboles graphiques en contexte. L'originalité des approches proposées réside dans la proposition d'une alliance intéressante entre l'Analyse Formelle de Concepts et la vision par ordinateur. Pour ce faire, nous nous sommes confrontés à l'étude du domaine de l'AFC et plus précisément l'adaptation de la structure du treillis de concepts et son utilisation comme étant l'outil majeur de nos travaux. La principale particularité de notre travail réside dans son aspect générique vu que les méthodes proposées peuvent être alliées à divers outils autre que le treillis de concepts en gardant les mêmes stratégies adoptées et en suivant une procédure semblable. Notre incursion dans le domaine de l'Analyse Formelle de Concepts et plus précisément notre choix de la structure du treillis de Galois appelé aussi treillis de concepts est motivé par les nombreux avantages présentés par cet outil. Le principal avantage du treillis de concepts est l'aspect symbolique qu'il offre. Il présente un espace de recherche concis, précis et souple facilitant ainsi la prise de décision. Nos contributions sont inscrites dans le cadre de la reconnaissance et de localisation de symboles dans les documents graphiques. Nous proposons des chaînes de traitement s'inscrivant dans le domaine de la vision par ordinateur / Computer vision is a field that includes methods for the acquisition, processing, analysis and understanding of images to produce numerical or symbolic information. A research contributing to the development of this area is to replicate the capabilities of human vision to perceive and understand images. Our thesis is part of this research axis. We propose several original contributions belonging to the context of graphics recognition and spotting context. The originality of the proposed approaches is the proposal of an interesting alliance between the Formal Concept Analysis and the Computer Vision fields. We face the study of the FCA field and more precisely the adaptation of the structure of concept lattice and its use as the main tool of our work. The main feature of our work lies in its generic aspect because the proposed methods can be combined with various other tools keeping the same strategies and following a similar procedure. Our foray into the area of the Formal Concept Analysis and more precisely our choice of the structure of the Galois lattice, also called concept lattice is motivated by the many advantages offered by this tool. The main advantage of concept lattice is the symbolic aspect. It is a concise, accurate and flexible search space thus facilitating decision making. Our contributions are recorded as part of the recognition and localization of symbols in graphic documents. We propose to recognize and spot symbols in graphical documents (technical drawings for example) using the alliance between the bag of words representation and the Galois lattice formalism. We opt for various methods belonging to the computer vision field
29

A novel approach for continuous speech tracking and dynamic time warping : adaptive framing based continuous speech similarity measure and dynamic time warping using Kalman filter and dynamic state model

Khan, Wasiq January 2014 (has links)
Dynamic speech properties such as time warping, silence removal and background noise interference are the most challenging issues in continuous speech signal matching. Among all of them, the time warped speech signal matching is of great interest and has been a tough challenge for the researchers. An adaptive framing based continuous speech tracking and similarity measurement approach is introduced in this work following a comprehensive research conducted in the diverse areas of speech processing. A dynamic state model is introduced based on system of linear motion equations which models the input (test) speech signal frame as a unidirectional moving object along the template speech signal. The most similar corresponding frame position in the template speech is estimated which is fused with a feature based similarity observation and the noise variances using a Kalman filter. The Kalman filter provides the final estimated frame position in the template speech at current time which is further used for prediction of a new frame size for the next step. In addition, a keyword spotting approach is proposed by introducing wavelet decomposition based dynamic noise filter and combination of beliefs. The Dempster’s theory of belief combination is deployed for the first time in relation to keyword spotting task. Performances for both; speech tracking and keyword spotting approaches are evaluated using the statistical metrics and gold standards for the binary classification. Experimental results proved the superiority of the proposed approaches over the existing methods.
30

Historical handwriting representation model dedicated to word spotting application / Modèle de représentation des écritures pour la recherche de mots par similarité dans les documents manuscrits du patrimoine

Wang, Peng 18 November 2014 (has links)
L’objectif du travail de thèse est de proposer un modèle de représentation des écritures dans les images de documents du patrimoine sans recourir à une transcription des textes. Ce modèle, issu d’une étude très complète des méthodes actuelles de caractérisation des écritures, est à la base d’une proposition de scénario de recherche par similarité de mots, indépendante du scripteur et ne nécessitant pas d’apprentissage. La recherche par similarité proposée repose sur une structure de graphes intégrant des informations sur la topologie, la morphologie locale des mots et sur le contexte extrait du voisinage de chaque point d’intérêt. Un graphe est construit à partir du squelette décrit en chaque point sommet par le contexte de formes, descripteur riche et compact. L’extraction de mots est assurée par une première étape de localisation grossière de régions candidates, décrites par une séquence déduite d’une représentation par graphes liée à des critères topologiques de voisinage. L’appariement entre mots repose ensuite sur une distance dynamique et un usage adapté du coût d’édition approximé entre graphes rendant compte de la nature bi-dimensionnelle de l’écriture. L’approche a été conçue pour être robuste aux distorsions de l’écriture et aux changements de scripteurs. Les expérimentations sont réalisées sur des bases de documents manuscrits patrimoniaux exploitées dans les compétitions de word-spotting. Les performances illustrent la pertinence de la proposition et ouvrent des voies nouvelles d’investigation dans des domaines d’applications autour de la reconnaissance de symboles et d’écritures iconographiques / As more and more documents, especially historical handwritten documents, are converted into digitized version for long-term preservation, the demands for efficient information retrieval techniques in such document images are increasing. The objective of this research is to establish an effective representation model for handwriting, especially historical manuscripts. The proposed model is supposed to help the navigation in historical document collections. Specifically speaking, we developed our handwriting representation model with regards to word spotting application. As a specific pattern recognition task, handwritten word spotting faces many challenges such as the high intra-writer and inter-writer variability. Nowadays, it has been admitted that OCR techniques are unsuccessful in handwritten offline documents, especially historical ones. Therefore, the particular characterization and comparison methods dedicated to handwritten word spotting are strongly required. In this work, we explore several techniques that allow the retrieval of singlestyle handwritten document images with query image. The proposed representation model contains two facets of handwriting, morphology and topology. Based on the skeleton of handwriting, graphs are constructed with the structural points as the vertexes and the strokes as the edges. By signing the Shape Context descriptor as the label of vertex, the contextual information of handwriting is also integrated. Moreover, we develop a coarse-to-fine system for the large-scale handwritten word spotting using our representation model. In the coarse selection, graph embedding is adapted with consideration of simple and fast computation. With selected regions of interest, in the fine selection, a specific similarity measure based on graph edit distance is designed. Regarding the importance of the order of handwriting, dynamic time warping assignment with block merging is added. The experimental results using benchmark handwriting datasets demonstrate the power of the proposed representation model and the efficiency of the developed word spotting approach. The main contribution of this work is the proposed graph-based representation model, which realizes a comprehensive description of handwriting, especially historical script. Our structure-based model captures the essential characteristics of handwriting without redundancy, and meanwhile is robust to the intra-variation of handwriting and specific noises. With additional experiments, we have also proved the potential of the proposed representation model in other symbol recognition applications, such as handwritten musical and architectural classification

Page generated in 0.0782 seconds