Global ETD Search

31	Ambiguity Detection for Programming Language Grammars Basten, Bas 15 December 2011 (has links) (PDF) Context-free grammars are the most suitable and most widely used method for describing the syntax of programming languages. They can be used to generate parsers, which transform a piece of source code into a tree-shaped representation of the code's syntactic structure. These parse trees can then be used for further processing or analysis of the source text. In this sense, grammars form the basis of many engineering and reverse engineering applications, like compilers, interpreters and tools for software analysis and transformation. Unfortunately, context-free grammars have the undesirable property that they can be ambiguous, which can seriously hamper their applicability. A grammar is ambiguous if at least one sentence in its language has more than one valid parse tree. Since the parse tree of a sentence is often used to infer its semantics, an ambiguous sentence can have multiple meanings. For programming languages this is almost always unintended. Ambiguity can therefore be seen as a grammar bug. A specific category of context-free grammars that is particularly sensitive to ambiguity are character-level grammars, which are used to generate scannerless parsers. Unlike traditional token-based grammars, character-level grammars include the full lexical definition of their language. This has the advantage that a language can be specified in a single formalism, and that no separate lexer or scanner phase is necessary in the parser. However, the absence of a scanner does require some additional lexical disambiguation. Character-level grammars can therefore be annotated with special disambiguation declarations to specify which parse trees to discard in case of ambiguity. Unfortunately, it is very hard to determine whether all ambiguities have been covered. The task of searching for ambiguities in a grammar is very complex and time consuming, and is therefore best automated. Since the invention of context-free grammars, several ambiguity detection methods have been developed to this end. However, the ambiguity problem for context-free grammars is undecidable in general, so the perfect detection method cannot exist. This implies a trade-off between accuracy and termination. Methods that apply exhaustive searching are able to correctly find all ambiguities, but they might never terminate. On the other hand, approximative search techniques do always produce an ambiguity report, but these might contain false positives or false negatives. Nevertheless, the fact that every method has flaws does not mean that ambiguity detection cannot be useful in practice. This thesis investigates ambiguity detection with the aim of checking grammars for programming languages. The challenge is to improve upon the state-of-the-art, by finding accurate enough methods that scale to realistic grammars. First we evaluate existing methods with a set of criteria for practical usability. Then we present various improvements to ambiguity detection in the areas of accuracy, performance and report quality. The main contributions of this thesis are two novel techniques. The first is an ambi- guity detection method that applies both exhaustive and approximative searching, called AMBIDEXTER. The key ingredient of AMBIDEXTER is a grammar filtering technique that can remove harmless production rules from a grammar. A production rule is harmless if it does not contribute to any ambiguity in the grammar. Any found harmless rules can therefore safely be removed. This results in a smaller grammar that still contains the same ambiguities as the original one. However, it can now be searched with exhaustive techniques in less time. The grammar filtering technique is formally proven correct, and experimentally validated. A prototype implementation is applied to a series of programming language grammars, and the performance of exhaustive detection methods are measured before and after filtering. The results show that a small investment in filtering time can substantially reduce the run-time of exhaustive searching, sometimes with several orders of magnitude. After this evaluation on token-based grammars, the grammar filtering technique is extended for use with character-level grammars. The extensions deal with the increased complexity of these grammars, as well as their disambiguation declarations. This enables the detection of productions that are harmless due to disambiguation. The extentions are experimentally validated on another set of programming language grammars from practice, with similar results as before. Measurements show that, even though character-level grammars are more expensive to filter, the investment is still very worthwhile. Exhaustive search times were again reduced substantially. The second main contribution of this thesis is DR. AMBIGUITY, an expert system to help grammar developers to understand and solve found ambiguities. If applied to an ambiguous sentence, DR. AMBIGUITY analyzes the causes of the ambiguity and proposes a number of applicable solutions. A prototype implementation is presented and evaluated with a mature Java grammar. After removing disambiguation declarations from the grammar we analyze sentences that have become ambiguous by this removal. The results show that in all cases the removed filter is proposed by DR. AMBIGUITY as a possible cure for the ambiguity. Concluding, this thesis improves ambiguity detection with two novel methods. The first is the ambiguity detection method AMBIDEXTER that applies grammar filtering to substantially speed up exhaustive searching. The second is the expert system DR. AMBIGUITY that automatically analyzes found ambiguities and proposes applicable cures. The results obtained with both methods show that automatic ambiguity detection is now ready for realistic programming language grammars. context-free grammars programming languages ambiguity detection static analysis scannerless
32	Providing Mainstream Parser Generators with Modular Language Definition Support Karol, Sven, Zschaler, Steffen 17 January 2012 (has links) (PDF) The composition and reuse of existing textual languages is a frequently re-occurring problem. One possibility of composing textual languages lies on the level of parser specifications which are mainly based on context-free grammars and regular expressions. Unfortunately most mainstream parser generators provide proprietary specification languages and usually do not provide strong abstractions for reuse. New forms of parser generators do support modular language development, but they can often not be easily integrated with existing legacy applications. To support modular language development based on mainstream parser generators, in this paper we apply the Invasive Software Composition (ISC) paradigm to parser specification languages by using our Reuseware framework. Our approach is grounded on a platform independent metamodel and thus does not rely on a specific parser generator. kontextfreie Grammatiken Komposition Vererbung Parser-Generatoren lexikalische Zustände context-free languages composition grammar inheritance lexer states parser generators ddc:004 rvk:SS 5514
33	Apprentissage non supervisé de dépendances à partir de textes / Unsupervised dependency parsing from texts Arcadias, Marie 02 October 2015 (has links) Les grammaires de dépendance permettent de construire une organisation hiérarchique syntaxique des mots d’une phrase. La construction manuelle des arbres de dépendances étant une tâche exigeant temps et expertise, de nombreux travaux cherchent à l’automatiser. Visant à établir un processus léger et facilement adaptable nous nous sommes intéressés à l’apprentissage non supervisé de dépendances, évitant ainsi d’avoir recours à une expertise coûteuse. L’état de l’art en apprentissage non supervisé de dépendances (DMV) se compose de méthodes très complexes et extrêmement sensibles au paramétrage initial. Nous présentons dans cette thèse un nouveau modèle pour résoudre ce problème d’analyse de dépendances, mais de façon plus simple, plus rapide et plus adaptable. Nous apprenons une famille de grammaires (PCFG) réduites à moins de 6 non terminaux et de 15 règles de combinaisons des non terminaux à partir des étiquettes grammaticales. Les PCFG de cette famille que nous nommons DGdg (pour DROITE GAUCHE droite gauche) se paramètrent très légèrement, ainsi elles s’adaptent sans effort aux 12 langues testées. L’apprentissage et l’analyse sont effectués au moins deux fois plus rapidement que DMV sur les mêmes données. Et la qualité des analyses DGdg est pour certaines langues proches des analyses par DMV. Nous proposons une première application de notre méthode d’analyse de dépendances à l’extraction d’informations. Nous apprenons par des CRF un étiquetage en fonctions « sujet », « objet » et « prédicat », en nous fondant sur des caractéristiques extraites des arbres construits. / Dependency grammars allow the construction of a hierarchical organization of the words of sentences. The one-by-one building of dependency trees can be very long and it requries expert knowledge. In this regard, we are interested in unsupervised dependency learning. Currently, DMV give the state-of-art results in unsupervised dependency parsing. However, DMV has been known to be highly sensitive to initial parameters. The training of DMV model is also heavy and long. We present in this thesis a new model to solve this problem in a simpler, faster and more adaptable way. We learn a family of PCFG using less than 6 nonterminal symbols and less than 15 combination rules from the part-of-speech tags. The tuning of these PCFG is ligth, and so easily adaptable to the 12 languages we tested. Our proposed method for unsupervised dependency parsing can show the near state-of-the-art results, being twice faster. Moreover, we describe our interests in dependency trees to other applications such as relation extraction. Therefore, we show how such information from dependency structures can be integrated into condition random fields and how to improve a relation extraction task. Apprentissage non supervisé Grammaire de dépendances Grammaire hors contexte CYK Inside-Outside CRF Extraction de relations Unsupervised machine learning Dependency grammar Context-free grammar CKY Inside- Outside CRF Relation extraction 006.35
34	Métodos de pontos interiores como alternativa para estimar os parâmetros de uma gramática probabilística livre do contexto / Interior point methods as an alternative for estimating parameters of a stochastic context-free grammar Mamián López, Esther Sofía, 1985- 10 July 2013 (has links) Orientadores: Aurelio Ribeiro Leite de Oliveira, Fredy Angel Amaya Robayo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica / Made available in DSpace on 2018-08-23T17:46:00Z (GMT). No. of bitstreams: 1 MamianLopez_EstherSofia_M.pdf: 1176541 bytes, checksum: 8f49901f40e77c9511c30e86c0d1bb0d (MD5) Previous issue date: 2013 / Resumo: Os modelos probabilísticos de uma linguagem (MPL) são modelos matemáticos onde é definida uma função de probabilidade que calcula a probabilidade de ocorrência de uma cadeia em uma linguagem. Os parâmetros de um MPL, que são as probabilidades de uma cadeia, são aprendidos a partir de uma base de dados (amostras de cadeias) pertencentes à linguagem. Uma vez obtidas as probabilidades, ou seja, um modelo da linguagem, existe uma medida para comparar quanto o modelo obtido representa a linguagem em estudo. Esta medida é denominada perplexidade por palavra. O modelo de linguagem probabilístico que propomos estimar, está baseado nas gramáticas probabilísticas livres do contexto. O método clássico para estimar os parâmetros de um MPL (Inside-Outside) demanda uma grande quantidade de tempo, tornando-o inviável para aplicações complexas. A proposta desta dissertação consiste em abordar o problema de estimar os parâmetros de um MPL usando métodos de pontos interiores, obtendo bons resultados em termos de tempo de processamento, número de iterações até obter convergência e perplexidade por palavra / Abstract: In a probabilistic language model (PLM), a probability function is defined to calculate the probability of a particular string ocurring within a language. These probabilities are the PLM parameters and are learned from a corpus (string samples), being part of a language. When the probabilities are calculated, with a language model as a result, a comparison can be realized in order to evaluate the extent to which the model represents the language being studied. This way of evaluation is called perplexity per word. The PLM proposed in this work is based on the probabilistic context-free grammars as an alternative to the classic method inside-outside that can become quite time-consuming, being unviable for complex applications. This proposal is an approach to estimate the PLM parameters using interior point methods with good results being obtained in processing time, iterations number until convergence and perplexity per word / Mestrado / Matematica Aplicada / Mestra em Matemática Aplicada Linguagens formais Modelamento da linguagem Métodos de pontos interiores Probabilistic context-free grammars Formal languages Language modeling Interior-point methods
35	Understanding Context-free Grammars through Data Visualization Hultin, Felix January 2016 (has links) Ever since the late 1950's, context-free grammars have played an important role within the field of linguistics, been a part of introductory courses and expanded into other fields of study. Meanwhile, data visualization in modern web development has made it possible to do feature rich visualization in the browser. In this thesis, these two developments are united, by developing a browser based app, to write context-free grammars, parse sentences and visualize the output. A user experience study with usability-tests and user-interviews is conducted, in order to investigate the possible benefits and disadvantages of said visualization when writing context-free grammars. The results show that data visualization was limitedly used by participants, in that it helped them to see if sentences were parsed and, if a sentence was not parsed, at which position parsing went wrong. Future improvements on the software and studies on them are proposed as well as the expansion of the field of data visualization within linguistics. / Ända sedan det sena 1950-talet har kontextfria grammatiker spelat en viktig roll hos lingvistiska teorier, används i introduktionskurser och expanderats till andra forskningsfält. Samtidigt har datavisualisering inom modern webbutveckling gjort det möjligt att skapa innehållsrik visualisering i webbläsaren. I detta examensarbete förenas dessa två utvecklingar genom utvecklandet av en webbapplikation, gjord för att skriva kontextfria grammatiker, parsa meningar och visualisera utdatan. En användarbarhetsstudie utförs, bestående av användartest och användaintervjuer, för att undersöka möjliga fördelar och nackdelar av visualisering i skrivandet av kontextfria grammatiker. Resultaten visar att data visualisering användes på ett begränsat sätt av deltagarna, i den meningen att det hjälpte dem att se om satser kan parsas och, om en sats inte blir parsad, se på vilket stället parsning misslyckades. Framtida förbättringar av applikationen och studier av dem föreslås samt en utbyggnad av data visualisering inom lingvistik. Context-free grammar data visualization usability testing user interview D3.js Backbone.js JavaScript General Language Studies and Linguistics
36	Contributions à la vérification et à la validation efficaces fondées sur des modèles / contributions to efficient model-based verificarion and validation Dreyfus, Alois 22 October 2014 (has links) Les travaux de cette thèse contribuent au développement de méthodes automatiques de vérification et de valida-tion de systèmes informatiques, à partir de modèles. Ils sont divisés en deux parties : vérification et générationde tests.Dans la partie vérification, pour le problème du model-checking régulier indécidable en général, deux nouvellestechniques d’approximation sont définies, dans le but de fournir des (semi-)algorithmes efficaces. Des sur-approximations de l’ensemble des états accessibles sont calculées, avec l’objectif d’assurer la terminaison del’exploration de l’espace d’états. Les états accessibles (ou des sur-approximations de cet ensemble d’états)sont représentés par des langages réguliers, ou automates d’états finis. La première technique consiste à sur-approximer l’ensemble des états atteignables en fusionnant des états des automates, en fonction de critèressyntaxiques simples, ou d’une combinaison de ces critères. La seconde technique d’approximation consisteaussi à fusionner des états des automates, mais à l’aide de transducteurs. De plus, pour cette seconde technique,nous développons une nouvelle approche pour raffiner les approximations, qui s’inspire du paradigme CEGAR(CounterExample-Guided Abstraction Refinement). Ces propositions ont été expérimentées sur des exemplesde protocoles d’exclusion mutuelle.Dans la partie génération de tests, une technique qui permet de combiner la génération aléatoire avec un critèrede couverture, à partir de modèles algébriques (des grammaires algébriques, des automates à pile) est définie.Générer les tests à partir de ces modèles algébriques (au lieu de le faire à partir de graphes) permet de réduirele degré d’abstraction du modèle et donc de générer moins de tests qui ne sont pas exécutables dans le systèmeréel. Ces propositions ont été expérimentées sur la grammaire de JSON (JAvaScript Object Notation), ainsi quesur des automates à pile correspondant à des appels de fonctions mutuellement récursives, à une requête XPath,et à l’algorithme Shunting-Yard. / The thesis contributes to development of automatic methods for model-based verification and validation ofcomputer systems. It is divided into two parts: verification and test generation.In the verification part, for the problem of regular model checking undecidable in general, two new approxi-mation techniques are defined in order to provide efficient (semi-)algorithms. Over-approximations of the setof reachable states are computed, with the objective of ensuring the termination of the exploration of the statespace. Reachable states (or over-approximations of this set of states) are represented by regular languages or,equivalently, by finite-state automata. The first technique consists in over-approximating the set of reachablestates by merging states of automata, based on simple syntactic criteria, or on a combination of these criteria.The second approximation technique also merges automata states, by using transducers. For the second tech-nique, we develop a new approach to refine approximations, inspired by the CEGAR paradigm (for Counter-Example-Guided Abstraction Refinement). These proposals have been tested on examples of mutual exclusionprotocols.In the test generation part, a technique that combines the random generation with coverage criteria, fromcontext-free models (context-free grammars, pushdown automata) is defined. Generate tests from these mo-dels (instead of doing from graphs) reduces the model abstraction level, and therefore allows having moretests executable in the real system. These proposals have been tested on the JSON grammar (JavaScript ObjectNotation), as well as on pushdown automata of mutually recursive functions, of an XPath query, and of theShunting-Yard algorithm. Model-cheking régulier Approximation Modèles algébriques Génération de tests aléatoire Critères de couverture Regular model-checking Approximations Context-free models Rarndom tests generation Coverage criteria 004
37	Classification et caractérisation de familles enzymatiques à l'aide de méthodes formelles / Classification and characterization of enzymatic families with formal methods Garet, Gaëlle 16 December 2014 (has links) Cette thèse propose une nouvelle approche de découverte de signatures de familles (et superfamilles) d'enzymes. Dans un premier temps, étant donné un échantillon aligné de séquences appartenant à une même famille, cette approche infère des grammaires algébriques caractérisant cette famille. Pour ce faire, de nouveaux principes de généralisation et de nouvelles classes de langages ont été introduites sur la base de la substituabilité locale. Un algorithme a également été développé à cet effet qui produit une grammaire réduite, conservant la structuration des exemples, d'un langage substituable. Dans un second temps, ce manuscrit présente une méthode de classification des séquences d'une superfamille en familles à l'aide d'une analyse de concepts formels basée sur l'alignement des séquences qui permet la détection de nouvelles familles et la découverte des motifs fonctionnels pour améliorer les signatures précédentes. / This thesis proposes a new approach to discover signatures of families (and superfamilies) enzymes. At first, given a sample of aligned sequences belonging to the same family, this approach infers context-free grammars characteristic of this family. To do this, new principles of generalization and new classes have been introduced based on substitutability. An algorithm has also been developed for this purpose, which produces a reduced grammar able to retain the structure of examples. In a second step, this manuscript presents a method for classification of a superfamily sequences into families with a formal concept analysis based on alignement sequences allowing detection of new families and the discovery of patterns to improve functional previous signatures. Bioinformatique Enzyme Famille Inférence grammaticale Grammaire algébrique Substituabilité Analyse de concepts formels Bioinformatics Enzyme Family Grammatical inference Context-Free grammar, substitutability Formal concept analysis
38	[en] CORRESPONDENCE BETWEEN PEGS AND CLASSES OF CONTEXT-FREE GRAMMARS / [pt] CORRESPONDÊNCIA ENTRE PEGS E CLASSES DE GRAMÁTICAS LIVRES DE CONTEXTO SERGIO QUEIROZ DE MEDEIROS 31 January 2011 (has links) [pt] Gramáticas de Expressões de Parsing (PEGs) são um formalismo que permite descrever linguagens e que possui como característica distintiva o uso de um operador de escolha ordenada. A classe de linguagens descrita por PEGs contém propriamente todas as linguagens livres de contexto determinísticas. Nesta tese discutimos a correspondência de PEGs com dois outros formalismos usados para descrever linguagens: expressões regulares e Gramáticas Livres de Contexto (CFGs). Apresentamos uma formalização de expressões regulares usando semântica natural e mostramos uma transformação para converter expressões regulares em PEGs que descrevem a mesma linguagem; essa transformação pode ser facilmente adaptada para acomodar diversas extensões usadas por bibliotecas de expressões regulares (e.g., repetição preguiçosa e subpadrões independentes). Também apresentamos uma nova formalização de CFGs usando semântica natural e mostramos a correspondência entre CFGs lineares à direita e PEGs equivalentes. Além disso, mostramos que gramáticas LL(1) com uma pequena restrição descrevem a mesma linguagem quando interpretadas como CFGs e quando interpretadas como PEGs. Por fim, mostramos como transformar CFGs LL(k)-forte em PEGs equivalentes. / [en] Parsing Expression Grammars (PEGs) are a formalism that allow us to describe languages and that has as its distinguishing feature the use of an ordered choice operator. The class of languages described by PEGs properly contains all deterministic context-free languages. In this thesis we discuss the correspondence between PEGs and two other formalisms used to describe languages: regular expressions and Context-Free Grammars (CFGs). We present a new formalization of regular expressions that uses natural semantics and we show a transformation to convert a regular expression into a PEG that describes the same language; this transformation can be easily adapted to accommodate several extensions used by regular expression libraries (e.g., lazy repetition and independent subpatterns). We also present a new formalization of CFGs that uses natural semantics and we show the correspondence between right linear CFGs and equivalent PEGs. Moreover, we show that LL(1) grammars with a minor restriction define the same language when interpreted as a CFG and when interpreted as a PEG. Finally, we show how to transform strong-LL(k) CFGs into PEGs that are equivalent. [pt] INFORMATICA [en] COMPUTER SCIENCE [pt] LINGUAGENS DE PROGRAMACAO [en] PROGRAMMING LANGUAGES [pt] GRAMATICAS DE EXPRESSOES DE PARSING [en] PARSING EXPRESSION GRAMMARS [pt] GRAMATICAS LIVRES DE CONTEXTO [en] CONTEXT-FREE GRAMMARS
39	Grammar-Based Translation Framework / Grammar-Based Translation Framework Vít, Radek January 2019 (has links) V této práci prozkoumáváme existující algoritmy pro přijímání jazyků definovaných bezkontextovými gramatikami. Na základě těchto znalostí navrhujeme nový model pro reprezentaci LR automatů a s jeho pomocí definujeme nový algoritmus LSCELR. Modifikujeme algoritmy pro přijímání jazyků k vytvoření algoritmů pro překlad založený na překladových gramatikách. Definujeme atributové překladové gramatiky jako rozšířené překladové gramatiky pro definici vztahů mezi vstupními a výstupními symboly překladu. Implementujeme překladový framework ctf založený na gramatikách, který implementuje překlad pomocí LSCELR. Definujeme jazyk pro popis atributových překladových gramatik a implementujeme překladač pro překlad této reprezentace do zdrojového kódu pro implementovaný framework.
40	Použití strukturální metody pro rozpoznávání objektů / Using structural method for objects recognition Valsa, Vít January 2015 (has links) This diploma thesis deals with posibilities of using structural methods for recognition objects in a picture. The first part of this thesis describes methods for preparing the picture before processing. The core of the whole thesis is in chapter 3, where is analyzed in details the problem of the formation of deformation grammars for parsing and their using. In the next part is space for syntactic parser describing the deformation grammar. The conclusion is focused on testing the suggested methods and their results.

Search results