Global ETD Search

41	Controlled English Commenting System Victor, Pradeep 07 February 2001 (has links) This thesis describes the implementation of a Controlled English Commenting (CEC) system that aids a VHDL modeler in entering controlled English comments. The CEC system developed includes a graphical user interface (GUI). The interface permits a modeler to submit comments for insertion at user selected points in a text file containing the model. A submitted comment is analyzed for vocabulary and syntax, and is then inserted if it is controlled English. If it is not, the CEC system extracts all possible controlled English comments that can be formed from the original comment and presents them to the user for selection and entry into the model. The interface then queries the user to complete any residual portions of the original comment until the user is satisfied. Until the user becomes familiar with the constraints of the controlled language, significant interaction is needed, particularly on complex comments. Preliminary experiments indicate that users rapidly learn the language's constraints and the need for interactive help declines. / Master of Science Parsing Documentation GUI Comment generation Controlled English
42	The Effect of Icing on the Dispatch Reliability of Small Aircraft Gates, Melinda M. 08 December 2004 (has links) In 2000, the National Aeronautics and Space Administration (NASA) initiated a program to promote the use of small aircraft as an additional option for national public transportation. The Small Aircraft Transportation System (SATS) asserted the idea of everyday individuals piloting themselves on trips, within a specified distance range, using a small (4 person), piston powered, un-pressurized aircraft and small airports in close proximity to their origin and destination. This thesis investigates how one weather phenomenon, in-flight icing, affects the dispatch reliability of this transportation system. Specifically, this research presumes that a route is considered a "no-go" for low time pilots in a small, piston powered aircraft if any icing conditions are forecast along the route at the altitude of the flight during the time the traveler desires to make the trip. This thesis evaluates direct flights between Cleveland and Boston; Boston and Washington, D.C.; and Washington, D.C. and Cleveland during the months of November through May for the years 2001 to 2003 at maximum cruising altitudes of 6,000 feet, 8,000 feet, 10,000 feet, and 12,000 feet above mean sea level (MSL). It was found that the overall probability of a "no-go" for all three flight paths at the normal cruising altitude of 12,000 feet is 56.8%. When the cruising altitude is reduced to 10,000 feet, 8,000 feet, and 6,000 feet the probability of a "no-go" for all three flight paths reduces to 54.6%, 48.5%, and 43.7% respectively. / Master of Science Matlab data parsing SATS dispatch reliability
43	Optimal Parsing for dictionary text compression / Parsing optimal pour la compression du texte par dictionnaire Langiu, Alessio 03 April 2012 (has links) Les algorithmes de compression de données basés sur les dictionnaires incluent une stratégie de parsing pour transformer le texte d'entrée en une séquence de phrases du dictionnaire. Etant donné un texte, un tel processus n'est généralement pas unique et, pour comprimer, il est logique de trouver, parmi les parsing possibles, celui qui minimise le plus le taux de compression finale. C'est ce qu'on appelle le problème du parsing. Un parsing optimal est une stratégie de parsing ou un algorithme de parsing qui résout ce problème en tenant compte de toutes les contraintes d'un algorithme de compression ou d'une classe d'algorithmes de compression homogène. Les contraintes de l'algorithme de compression sont, par exemple, le dictionnaire lui-même, c'est-à-dire l'ensemble dynamique de phrases disponibles, et combien une phrase pèse sur le texte comprimé, c'est-à-dire quelle est la longueur du mot de code qui représente la phrase, appelée aussi le coût du codage d'un pointeur de dictionnaire. En plus de 30 ans d'histoire de la compression de texte par dictionnaire, une grande quantité d'algorithmes, de variantes et d'extensions sont apparus. Cependant, alors qu'une telle approche de la compression du texte est devenue l'une des plus appréciées et utilisées dans presque tous les processus de stockage et de communication, seuls quelques algorithmes de parsing optimaux ont été présentés. Beaucoup d'algorithmes de compression manquent encore d'optimalité pour leur parsing, ou du moins de la preuve de l'optimalité. Cela se produit parce qu'il n'y a pas un modèle général pour le problème de parsing qui inclut tous les algorithmes par dictionnaire et parce que les parsing optimaux existants travaillent sous des hypothèses trop restrictives. Ce travail focalise sur le problème de parsing et présente à la fois un modèle général pour la compression des textes basée sur les dictionnaires appelé la théorie Dictionary-Symbolwise et un algorithme général de parsing qui a été prouvé être optimal sous certaines hypothèses réalistes. Cet algorithme est appelé Dictionary-Symbolwise Flexible Parsing et couvre pratiquement tous les cas des algorithmes de compression de texte basés sur dictionnaire ainsi que la grande classe de leurs variantes où le texte est décomposé en une séquence de symboles et de phrases du dictionnaire. Dans ce travail, nous avons aussi considéré le cas d'un mélange libre d'un compresseur par dictionnaire et d'un compresseur symbolwise. Notre Dictionary-Symbolwise Flexible Parsing couvre également ce cas-ci. Nous avons bien un algorithme de parsing optimal dans le cas de compression Dictionary-Symbolwise où le dictionnaire est fermé par préfixe et le coût d'encodage des pointeurs du dictionnaire est variable. Le compresseur symbolwise est un compresseur symbolwise classique qui fonctionne en temps linéaire, comme le sont de nombreux codeurs communs à longueur variable. Notre algorithme fonctionne sous l'hypothèse qu'un graphe spécial, qui sera décrit par la suite, soit bien défini. Même si cette condition n'est pas remplie, il est possible d'utiliser la même méthode pour obtenir des parsing presque optimaux. Dans le détail, lorsque le dictionnaire est comme LZ78, nous montrons comment mettre en œuvre notre algorithme en temps linéaire. Lorsque le dictionnaire est comme LZ77 notre algorithme peut être mis en œuvre en temps O (n log  n) où n est le longueur du texte. Dans les deux cas, la complexité en espace est O (n). Même si l'objectif principal de ce travail est de nature théorique, des résultats expérimentaux seront présentés pour souligner certains effets pratiques de l'optimalité du parsing sur les performances de compression et quelques résultats expérimentaux plus détaillés sont mis dans une annexe appropriée / Dictionary-based compression algorithms include a parsing strategy to transform the input text into a sequence of dictionary phrases. Given a text, such process usually is not unique and, for compression purpose, it makes sense to find one of the possible parsing that minimizes the final compression ratio. This is the parsing problem. An optimal parsing is a parsing strategy or a parsing algorithm that solve the parsing problem taking account of all the constraints of a compression algorithm or of a class of homogeneous compression algorithms. Compression algorithm constrains are, for instance, the dictionary itself, i.e. the dynamic set of available phrases, and how much a phrase weight on the compressed text, i.e. the length of the codeword that represent such phrase also denoted as the cost of a dictionary pointer encoding. In more than 30th years of history of dictionary based text compression, while plenty of algorithms, variants and extensions appeared and while such approach to text compression become one of the most appreciated and utilized in almost all the storage and communication process, only few optimal parsing algorithms was presented. Many compression algorithms still leaks optimality of their parsing or, at least, proof of optimality. This happens because there is not a general model of the parsing problem that includes all the dictionary based algorithms and because the existing optimal parsings work under too restrictive hypothesis. This work focus on the parsing problem and presents both a general model for dictionary based text compression called Dictionary-Symbolwise theory and a general parsing algorithm that is proved to be optimal under some realistic hypothesis. This algorithm is called Dictionary-Symbolwise Flexible Parsing and it covers almost all the cases of dictionary based text compression algorithms together with the large class of their variants where the text is decomposed in a sequence of symbols and dictionary phrases.In this work we further consider the case of a free mixture of a dictionary compressor and a symbolwise compressor. Our Dictionary-Symbolwise Flexible Parsing covers also this case. We have indeed an optimal parsing algorithm in the case of dictionary-symbolwise compression where the dictionary is prefix closed and the cost of encoding dictionary pointer is variable. The symbolwise compressor is any classical one that works in linear time, as many common variable-length encoders do. Our algorithm works under the assumption that a special graph that will be described in the following, is well defined. Even if this condition is not satisfied it is possible to use the same method to obtain almost optimal parses. In detail, when the dictionary is LZ78-like, we show how to implement our algorithm in linear time. When the dictionary is LZ77-like our algorithm can be implemented in time O(n log n). Both have O(n) space complexity. Even if the main aim of this work is of theoretical nature, some experimental results will be introduced to underline some practical effects of the parsing optimality in compression performance and some more detailed experiments are hosted in a devoted appendix Compression du texte Compression par dictionnaire Parsing optimal Text Compression Dictionary compression Optimal parsing
44	Recognition of online handwritten mathematical expressions using contextual information / Reconhecimento online de expressões matemáticas manuscritas usando informação contextual Aguilar, Frank Dennis Julca 29 April 2016 (has links) Online handwritten mathematical expressions consist of sequences of strokes, usually collected through a touch screen device. Automatic recognition of online handwritten mathematical expressions requires solving three subproblems: symbol segmentation, symbol classification, and structural analysis (that is, the identification of spatial relations, as subscript or superscript, between symbols). A main issue in the recognition process is ambiguity at symbol or relation levels that often leads to several likely interpretations of an expression. Some methods treat the recognition problem as a pipeline process, in which symbol segmentation and classification is followed by structural analysis. A main drawback of such methods is that they compute symbol level interpretations without considering structural information, which is essential to solve ambiguities. To cope with this drawback, more recent methods adapt string parsing techniques to drive the recognition process. As string grammars were originally designed to model linear arrangements of objects (like in text, where symbols are arranged only through left-to-right relations), non-linear arrangements of mathematical symbols (given by the multiple relation types of mathematics) are modeled as compositions of production rules for linear structures. Then, parsing an expression involves searching for linear structures in the expression that are consistent with the structure of the production rules. This last step requires the introduction of constraints or assumptions, such as stroke input order or vertical and horizontal alignments, to linearize the expression components. These requirements not only limit the effectiveness of the methods, but also make difficult their extension to include new expression structures. In this thesis, we model the recognition problem as a graph parsing problem. The graph-based description of relations in the production rules allows direct modeling of non-linear mathematical structures. Our parsing algorithm determines recursive partitions of the input strokes that induce graphs matching the production rule graphs. To mitigate the computational cost, we constrain the possible partitions to graphs derived from sets of symbol and relation hypotheses, calculated using previously trained classifiers. A set of labels that indicate likely interpretations is associated to each symbol and relation hypothesis, and treatment of ambiguity at symbol and relation levels is left to the parsing process. The parsing algorithm builds a forest in which each tree corresponds to an interpretation coherent with the grammar. We define a score function, optimized through training data, that associates a cost to each tree. We then select a tree with minimum cost as result. Experimental evaluation shows that the proposed method is more accurate than several state of the art methods. Even though graph parsing is a computationally expensive process, the use of symbol and relation hypotheses to constrain the search space is able to effectively reduce complexity, allowing practical application of the process. Furthermore, since the proposed parsing algorithm does not make direct use of structural particularities of mathematical expressions, it has potential to be adapted for other two-dimensional object recognition problems. As a secondary contribution of this thesis, we have proposed a framework to automatize the process of building handwritten mathematical expression datasets. The framework has been implemented in a computer system and used to generate part of the samples used in the experimental part of this thesis. / Expressões matemáticas manuscritas online estão constituídas por sequências de traços. O reconhecimento automático de tais expressões requer a solução de três subproblemas: segmentação de símbolos, classificação de símbolos e análise estrutural (isto é, a identificação de relações espaciais, tais como sobrescrito e subscrito, entre símbolos). Uma das dificuldades principais do problema é a ambiguidade no nível de símbolos ou relações, que frequentemente sugere várias possíveis interpretações de uma mesma expressão. Alguns métodos de reconhecimento tratam o problema de maneira sequencial, onde um processo de segmentação e classificação de símbolos é seguido de análise estrutural. Um problema principal de tais métodos é que eles determinam interpretações no nível de símbolos sem considerar informação estrutural, a qual é importante para solucionar ambiguidades. Para solucionar esse problema, métodos mais recentes adaptaram técnicas de parsing de strings. Dado que gramáticas de strings foram originalmente projetadas para modelar arranjos lineares de tokens (como texto, onde símbolos são arranjados de esquerda a direita), a estrutura não linear dos símbolos matemáticos (dada pelos multiples tipos de relações espaciais) é modelada como uma composição de regras de produção de estruturas lineares. Dessa maneira, o parsing de uma expressão consiste em determinar estruturas lineares na expressão que são consistentes com as estruturas das regras de produção. Esse último passo requer a introdução de restrições, baseadas na definição de uma ordem em relação ao tempo ou espaço, para linearizar os componentes da expresão. Os requerimentos das gramáticas de strings não apenas limitam a efectividade dos métodos, mas também dificultam a extensão dos métodos na inclusão de novas estruturas. Neste trabalho, o problema de reconhecimento de expressões matemáticas é modelado como um problema de parsing de grafos. A representação por meio de grafos nas regras de produção permite uma representação direta das estruturas não lineares das expressões matemáticas. O algoritmo de parsing determina partições dos traços de entrada que induzem grafos isomorfos aos grafos das regras de produção. Para mitigar o custo computacional, restringimos as possíveis partições a aquelas derivadas de um conjunto de possíveis símbolos e relações identificados por classificadores previamente treinados. Um conjunto de rótulos que indica interpretações alternativas é associado a cada símbolo e relação; a decisão da melhor interpretação é realizada pelo parser. O parser construi uma floresta na qual uma árvore representa uma possível interpretação da entrada, e atribui um custo de interpretação para cada árvore, baseado nas relações e símbolos definidas na árvore. O resultado do reconhecimento é dado pela extração de uma árvore com custo mínimo. Resultados experimentais do método proposto mostram um melhor desempenho em comparação com vários métodos descritos na literatura. A pesar do parsing de grafos ser um processo computacionalmente caro, a restrição do espaço de busca proposto reduz a complexidade o suficiente para permitir uma aplicação prática da abordagem. Adicionalmente, dado que a abordagem não pressupõe estruturas particulares das expressões matemática, o método tem potencial para ser adaptado para o reconhecimento de outras estruturas bidimensionais. Uma contribuição secundaria deste trabalho é o desenvolvimento de uma framework para construção automática de bancos de dados de expressões matemáticas manuscritas. A framework tem sido implementada num sistema usado para criar parte das amostras de expressões usadas para avaliação do método de reconhecimento. Contextual information Graph parsing Informação contextual Mathematical expression recognition Parsing de grafos
45	Neural-Symbolic Learning for Semantic Parsing / Analyse sémantique avec apprentissage neuro-symbolique Xiao, Chunyang 14 December 2017 (has links) Notre but dans cette thèse est de construire un système qui réponde à une question en langue naturelle (NL) en représentant sa sémantique comme une forme logique (LF) et ensuite en calculant une réponse en exécutant cette LF sur une base de connaissances. La partie centrale d'un tel système est l'analyseur sémantique qui transforme les questions en formes logiques. Notre objectif est de construire des analyseurs sémantiques performants en apprenant à partir de paires (NL, LF). Nous proposons de combiner des réseaux neuronaux récurrents (RNN) avec des connaissances préalables symboliques exprimées à travers des grammaires hors-contexte (CFGs) et des automates. En intégrant des CFGs contrôlant la validité des LFs dans les processus d'apprentissage et d'inférence des RNNs, nous garantissons que les formes logiques générées sont bien formées; en intégrant, par le biais d'automates pondérés, des connaissances préalables sur la présence de certaines entités dans la LF, nous améliorons encore la performance de nos modèles. Expérimentalement, nous montrons que notre approche permet d'obtenir de meilleures performances que les analyseurs sémantiques qui n'utilisent pas de réseaux neuronaux, ainsi que les analyseurs à base de RNNs qui ne sont pas informés par de telles connaissances préalables / Our goal in this thesis is to build a system that answers a natural language question (NL) by representing its semantics as a logical form (LF) and then computing the answer by executing the LF over a knowledge base. The core part of such a system is the semantic parser that maps questions to logical forms. Our focus is how to build high-performance semantic parsers by learning from (NL, LF) pairs. We propose to combine recurrent neural networks (RNNs) with symbolic prior knowledge expressed through context-free grammars (CFGs) and automata. By integrating CFGs over LFs into the RNN training and inference processes, we guarantee that the generated logical forms are well-formed; by integrating, through weighted automata, prior knowledge over the presence of certain entities in the LF, we further enhance the performance of our models. Experimentally, we show that our approach achieves better performance than previous semantic parsers not using neural networks as well as RNNs not informed by such prior knowledge Parsing sémantique Réseaux neuronaux Méthodes symboliques Semantic parsing Deep learning Symbolic methods 006.35
46	Transition-Based Natural Language Parsing with Dependency and Constituency Representations Hall, Johan January 2008 (has links) Denna doktorsavhandling undersöker olika aspekter av automatisk syntaktisk analys av texter på naturligt språk. En parser eller syntaktisk analysator, som vi definierar den i denna avhandling, har till uppgift att skapa en syntaktisk analys för varje mening i en text på naturligt språk. Vår metod är datadriven, vilket innebär att den bygger på maskininlärning från uppmärkta datamängder av naturligt språk, s.k. korpusar. Vår metod är också dependensbaserad, vilket innebär att parsning är en process som bygger en dependensgraf för varje mening, bestående av binära relationer mellan ord. Dessutom introducerar avhandlingen en ny metod för att koda frasstrukturer, en annan syntaktisk representationsform, som dependensgrafer vilka kan avkodas utan att information i frasstrukturen går förlorad. Denna metod möjliggör att en dependensbaserad parser kan användas för att syntaktiskt analysera frasstrukturer. Avhandlingen är baserad på fem artiklar, varav tre artiklar utforskar olika aspekter av maskininlärning för datadriven dependensparsning och två artiklar undersöker metoden för dependensbaserad frasstrukturparsning. Den första artikeln presenterar vår första storskaliga empiriska studie av parsning av naturligt språk (i detta fall svenska) med dependensrepresentationer. En transitionsbaserad deterministisk parsningsalgoritm skapar en dependensgraf för varje mening genom att härleda en sekvens av transitioner, och minnesbaserad inlärning (MBL) används för att förutsäga transitionssekvensen. Den andra artikeln undersöker ytterligare hur maskininlärning kan användas för att vägleda en transitionsbaserad dependensparser. Den empiriska studien jämför två metoder för maskininlärning med fem särdragsmodeller för tre språk (kinesiska, engelska och svenska), och studien visar att supportvektormaskiner (SVM) med lexikaliserade särdragsmodeller är bättre lämpade än MBL för att vägleda en transitionsbaserad dependensparser. Den tredje artikeln sammanfattar vår erfarenhet av att optimera MaltParser, vår implementation av transitionsbaserad dependensparsning, för ett stort antal språk. MaltParser har använts för att analysera över tjugo olika språk och var bland de främsta systemen i CoNLLs utvärdering 2006 och 2007. Den fjärde artikeln är vår första undersökning av dependensbaserad frastrukturparsning med konkurrenskraftiga resultat för parsning av tyska. Den femte och sista artikeln introducerar en förbättrad algoritm för att transformera frasstrukturer till dependensgrafer och tillbaka, vilket gör det möjligt att parsa kontinuerliga och diskontinuerliga frasstrukturer utökade med grammatiska funktioner. / Hall, Johan, 2008. Transition-Based Natural Language Parsing with Dependency and Constituency Representations, Acta Wexionensia No 152/2008. ISSN: 1404-4307, ISBN: 978-91-7636-625-7. Written in English. This thesis investigates different aspects of transition-based syntactic parsing of natural language text, where we view syntactic parsing as the process of mapping sentences in unrestricted text to their syntactic representations. Our parsing approach is data-driven, which means that it relies on machine learning from annotated linguistic corpora. Our parsing approach is also dependency-based, which means that the parsing process builds a dependency graph for each sentence consisting of lexical nodes linked by binary relations called dependencies. However, the output of the parsing process is not restricted to dependency-based representations, and the thesis presents a new method for encoding phrase structure representations as dependency representations that enable an inverse transformation without loss of information. The thesis is based on five papers, where three papers explore different ways of using machine learning to guide a transition-based dependency parser and two papers investigate the method for dependency-based phrase structure parsing. The first paper presents our first large-scale empirical study of parsing a natural language (in this case Swedish) with labeled dependency representations using a transition-based deterministic parsing algorithm, where the dependency graph for each sentence is constructed by a sequence of transitions and memory-based learning (MBL) is used to predict the transition sequence. The second paper further investigates how machine learning can be used for guiding a transition-based dependency parser. The empirical study compares two machine learning methods with five feature models for three languages (Chinese, English and Swedish), and the study shows that support vector machines (SVM) with lexicalized feature models are better suited than MBL for guiding a transition-based dependency parser. The third paper summarizes our experience of optimizing and tuning MaltParser, our implementation of transition-based parsing, for a wide range of languages. MaltParser has been applied to over twenty languages and was one of the top-performing systems in the CoNLL shared tasks of 2006 and 2007. The fourth paper is our first investigation of dependency-based phrase structure parsing with competitive results for parsing German. The fifth paper presents an improved encoding method for transforming phrase structure representations into dependency graphs and back. With this method it is possible to parse continuous and discontinuous phrase structure extended with grammatical functions. Natural Language Parsing Syntactic Parsing Dependency Structure Phrase Structure Machine Learning Computer science Datavetenskap
47	Incremental Parsing with Adjoining Operation MATSUBARA, Shigeki, KATO, Yoshihide 01 December 2009 (has links) No description available. Penn Treebank probabilistic parsing tree adjoining grammar incremental parsing
48	Robust Dependency Parsing of Spontaneous Japanese Spoken Language Ohno, Tomohiro, Matsubara, Shigeki, Kawaguchi, Nobuo, Inagaki, Yasuyoshi 03 1900 (has links) No description available. dependency parsing stochastic parsing Japanese speech linguistic phenomena syntactically annotated corpus
49	Syntaktická analýza textů se střídáním kódů / Syntaktická analýza textů se střídáním kódů Ravishankar, Vinit January 2018 (has links) (English) Vinit Ravishankar July 2018 The aim of this thesis is twofold; first, we attempt to dependency parse existing code-switched corpora, solely by training on monolingual dependency treebanks. In an attempt to do so, we design a dependency parser and ex- periment with a variety of methods to improve upon the baseline established by raw training on monolingual treebanks: these methods range from treebank modification to network modification. On this task, we obtain state-of-the- art results for most evaluation criteria on the task for our evaluation language pairs: Hindi/English and Komi/Russian. We beat our own baselines by a sig- nificant margin, whilst simultaneously beating most scores on similar tasks in the literature. The second part of the thesis involves introducing the relatively understudied task of predicting code-switching points in a monolingual utter- ance; we provide several architectures that attempt to do so, and provide one of them as our baseline, in the hopes that it should continue as a state-of-the-art in future tasks. 1
50	Recognition of online handwritten mathematical expressions using contextual information / Reconhecimento online de expressões matemáticas manuscritas usando informação contextual Frank Dennis Julca Aguilar 29 April 2016 (has links) Online handwritten mathematical expressions consist of sequences of strokes, usually collected through a touch screen device. Automatic recognition of online handwritten mathematical expressions requires solving three subproblems: symbol segmentation, symbol classification, and structural analysis (that is, the identification of spatial relations, as subscript or superscript, between symbols). A main issue in the recognition process is ambiguity at symbol or relation levels that often leads to several likely interpretations of an expression. Some methods treat the recognition problem as a pipeline process, in which symbol segmentation and classification is followed by structural analysis. A main drawback of such methods is that they compute symbol level interpretations without considering structural information, which is essential to solve ambiguities. To cope with this drawback, more recent methods adapt string parsing techniques to drive the recognition process. As string grammars were originally designed to model linear arrangements of objects (like in text, where symbols are arranged only through left-to-right relations), non-linear arrangements of mathematical symbols (given by the multiple relation types of mathematics) are modeled as compositions of production rules for linear structures. Then, parsing an expression involves searching for linear structures in the expression that are consistent with the structure of the production rules. This last step requires the introduction of constraints or assumptions, such as stroke input order or vertical and horizontal alignments, to linearize the expression components. These requirements not only limit the effectiveness of the methods, but also make difficult their extension to include new expression structures. In this thesis, we model the recognition problem as a graph parsing problem. The graph-based description of relations in the production rules allows direct modeling of non-linear mathematical structures. Our parsing algorithm determines recursive partitions of the input strokes that induce graphs matching the production rule graphs. To mitigate the computational cost, we constrain the possible partitions to graphs derived from sets of symbol and relation hypotheses, calculated using previously trained classifiers. A set of labels that indicate likely interpretations is associated to each symbol and relation hypothesis, and treatment of ambiguity at symbol and relation levels is left to the parsing process. The parsing algorithm builds a forest in which each tree corresponds to an interpretation coherent with the grammar. We define a score function, optimized through training data, that associates a cost to each tree. We then select a tree with minimum cost as result. Experimental evaluation shows that the proposed method is more accurate than several state of the art methods. Even though graph parsing is a computationally expensive process, the use of symbol and relation hypotheses to constrain the search space is able to effectively reduce complexity, allowing practical application of the process. Furthermore, since the proposed parsing algorithm does not make direct use of structural particularities of mathematical expressions, it has potential to be adapted for other two-dimensional object recognition problems. As a secondary contribution of this thesis, we have proposed a framework to automatize the process of building handwritten mathematical expression datasets. The framework has been implemented in a computer system and used to generate part of the samples used in the experimental part of this thesis. / Expressões matemáticas manuscritas online estão constituídas por sequências de traços. O reconhecimento automático de tais expressões requer a solução de três subproblemas: segmentação de símbolos, classificação de símbolos e análise estrutural (isto é, a identificação de relações espaciais, tais como sobrescrito e subscrito, entre símbolos). Uma das dificuldades principais do problema é a ambiguidade no nível de símbolos ou relações, que frequentemente sugere várias possíveis interpretações de uma mesma expressão. Alguns métodos de reconhecimento tratam o problema de maneira sequencial, onde um processo de segmentação e classificação de símbolos é seguido de análise estrutural. Um problema principal de tais métodos é que eles determinam interpretações no nível de símbolos sem considerar informação estrutural, a qual é importante para solucionar ambiguidades. Para solucionar esse problema, métodos mais recentes adaptaram técnicas de parsing de strings. Dado que gramáticas de strings foram originalmente projetadas para modelar arranjos lineares de tokens (como texto, onde símbolos são arranjados de esquerda a direita), a estrutura não linear dos símbolos matemáticos (dada pelos multiples tipos de relações espaciais) é modelada como uma composição de regras de produção de estruturas lineares. Dessa maneira, o parsing de uma expressão consiste em determinar estruturas lineares na expressão que são consistentes com as estruturas das regras de produção. Esse último passo requer a introdução de restrições, baseadas na definição de uma ordem em relação ao tempo ou espaço, para linearizar os componentes da expresão. Os requerimentos das gramáticas de strings não apenas limitam a efectividade dos métodos, mas também dificultam a extensão dos métodos na inclusão de novas estruturas. Neste trabalho, o problema de reconhecimento de expressões matemáticas é modelado como um problema de parsing de grafos. A representação por meio de grafos nas regras de produção permite uma representação direta das estruturas não lineares das expressões matemáticas. O algoritmo de parsing determina partições dos traços de entrada que induzem grafos isomorfos aos grafos das regras de produção. Para mitigar o custo computacional, restringimos as possíveis partições a aquelas derivadas de um conjunto de possíveis símbolos e relações identificados por classificadores previamente treinados. Um conjunto de rótulos que indica interpretações alternativas é associado a cada símbolo e relação; a decisão da melhor interpretação é realizada pelo parser. O parser construi uma floresta na qual uma árvore representa uma possível interpretação da entrada, e atribui um custo de interpretação para cada árvore, baseado nas relações e símbolos definidas na árvore. O resultado do reconhecimento é dado pela extração de uma árvore com custo mínimo. Resultados experimentais do método proposto mostram um melhor desempenho em comparação com vários métodos descritos na literatura. A pesar do parsing de grafos ser um processo computacionalmente caro, a restrição do espaço de busca proposto reduz a complexidade o suficiente para permitir uma aplicação prática da abordagem. Adicionalmente, dado que a abordagem não pressupõe estruturas particulares das expressões matemática, o método tem potencial para ser adaptado para o reconhecimento de outras estruturas bidimensionais. Uma contribuição secundaria deste trabalho é o desenvolvimento de uma framework para construção automática de bancos de dados de expressões matemáticas manuscritas. A framework tem sido implementada num sistema usado para criar parte das amostras de expressões usadas para avaliação do método de reconhecimento. Informação contextual Parsing de grafos Contextual information Graph parsing Mathematical expression recognition

Search results