Global ETD Search

101	Generalized ID/LP grammar: a formalism for parsing linearization-based HPSG grammars Daniels, Michael W. 13 July 2005 (has links) No description available. Language, Linguistics parsing backbone linearization HPSG free word order
102	Parallel Parsing in a Multiprocessor Environment Sarkar, Dilip 01 January 1988 (has links) (PDF) Parsing in a multiprocessor environment is considered. Two models for asynchronous bottom-up parallel parsing are presented. A method for estimating speedup in asynchronous bottom-up parallel parsing is developed, and it is used to estimate speedup obtainable by bottom-up parallel parsing of Pascal-like languages. It is found that bottom-up parallel parsing algorithms can attain a maximum speedup of 0 (L1/2) with (L1/2) processors, where L is the number of tokens in the string being parsed. Hence, bottom-up parallel parsing technique does not yield good speedup. A new parsing technique is proposed for parsing a class of block-structured languages. The novelty of the technique is that it is inherently parallel. By applying this new technique, a string of L tokens can be parsed in O (log L) time with (L /log L) processors. The parsing algorithm uses a parenthesis-matching algorithm developed here. The parenthesis-matching algorithm can find matching of a sequence of parentheses in O (log L) time with (L /log L) processors. Thus, the new parsing algorithm is cost optimal.
103	On The Practice of B-ing Earley Zingaro, Daniel C. 08 1900 (has links) <p> Earley's parsing algorithm is an O(n^3) algorithm for parsing according to any context-free grammar. Its theoretical importance stems from the fact that it was one of the first algorithms to achieve this time bound, but it has also seen success in compiler-compilers, theorem provers and natural language processing. It has an elegant structure, and its time complexity on restricted classes of grammars is often as good as specialized algorithms. Grammars with ϵ-productions, however, require special consideration, and have historically lead to inefficient and inelegant implementations.</p> <p> In this thesis, we develop the algorithm from specification using the B-Method. Through refinement steps, we arrive at a list-processing formulation, in which the problems with ϵ-productions emerge and can be understood. The development highlights the essential properties of the algorithm, and has also lead to the discovery of an implementation optimization. We end by giving a concept-test of the algorithm as a literate Pascal program.</p> / Thesis / Master of Computer Science (MCS)
104	Contributions to the syntactical analysis beyond context-freeness Bordihn, Henning January 2011 (has links) Parsability approaches of several grammar formalisms generating also non-context-free languages are explored. Chomsky grammars, Lindenmayer systems, grammars with controlled derivations, and grammar systems are treated. Formal properties of these mechanisms are investigated, when they are used as language acceptors. Furthermore, cooperating distributed grammar systems are restricted so that efficient deterministic parsing without backtracking becomes possible. For this class of grammar systems, the parsing algorithm is presented and the feature of leftmost derivations is investigated in detail. / Ansätze zum Parsing verschiedener Grammatikformalismen, die auch nicht-kontextfreie Sprachen erzeugen können, werden diskutiert. Chomsky-Grammatiken, Lindenmayer-Systeme, Grammatiken mit gesteuerten Ersetzungen und Grammatiksysteme werden behandelt. Formale Eigenschaften dieser Mechanismen als Akzeptoren von Sprachen werden untersucht. Weiterhin werden kooperierende verteilte (CD) Grammatiksysteme derart beschränkt, dass effizientes deterministisches Parsing ohne Backtracking möglich ist. Für diese Klasse von Grammatiksystemen wird der Parsingalgorithmus vorgestellt und die Rolle von Linksableitungen wird detailliert betrachtet. Parsing Akzeptierende Grammatiken Gesteuerte Ableitungen Grammatiksysteme Linksableitungen Parsing Accepting Grammars Controlled Derivations Grammar Systems Leftmost Derivations Data processing Computer science
105	Unsupervised Natural Language Processing for Knowledge Extraction from Domain-specific Textual Resources Hänig, Christian 25 April 2013 (has links) (PDF) This thesis aims to develop a Relation Extraction algorithm to extract knowledge out of automotive data. While most approaches to Relation Extraction are only evaluated on newspaper data dealing with general relations from the business world their applicability to other data sets is not well studied. Part I of this thesis deals with theoretical foundations of Information Extraction algorithms. Text mining cannot be seen as the simple application of data mining methods to textual data. Instead, sophisticated methods have to be employed to accurately extract knowledge from text which then can be mined using statistical methods from the field of data mining. Information Extraction itself can be divided into two subtasks: Entity Detection and Relation Extraction. The detection of entities is very domain-dependent due to terminology, abbreviations and general language use within the given domain. Thus, this task has to be solved for each domain employing thesauri or another type of lexicon. Supervised approaches to Named Entity Recognition will not achieve reasonable results unless they have been trained for the given type of data. The task of Relation Extraction can be basically approached by pattern-based and kernel-based algorithms. The latter achieve state-of-the-art results on newspaper data and point out the importance of linguistic features. In order to analyze relations contained in textual data, syntactic features like part-of-speech tags and syntactic parses are essential. Chapter 4 presents machine learning approaches and linguistic foundations being essential for syntactic annotation of textual data and Relation Extraction. Chapter 6 analyzes the performance of state-of-the-art algorithms of POS tagging, syntactic parsing and Relation Extraction on automotive data. The findings are: supervised methods trained on newspaper corpora do not achieve accurate results when being applied on automotive data. This is grounded in various reasons. Besides low-quality text, the nature of automotive relations states the main challenge. Automotive relation types of interest (e. g. component – symptom) are rather arbitrary compared to well-studied relation types like is-a or is-head-of. In order to achieve acceptable results, algorithms have to be trained directly on this kind of data. As the manual annotation of data for each language and data type is too costly and inflexible, unsupervised methods are the ones to rely on. Part II deals with the development of dedicated algorithms for all three essential tasks. Unsupervised POS tagging (Chapter 7) is a well-studied task and algorithms achieving accurate tagging exist. All of them do not disambiguate high frequency words, only out-of-lexicon words are disambiguated. Most high frequency words bear syntactic information and thus, it is very important to differentiate between their different functions. Especially domain languages contain ambiguous and high frequent words bearing semantic information (e. g. pump). In order to improve POS tagging, an algorithm for disambiguation is developed and used to enhance an existing state-of-the-art tagger. This approach is based on context clustering which is used to detect a word type’s different syntactic functions. Evaluation shows that tagging accuracy is raised significantly. An approach to unsupervised syntactic parsing (Chapter 8) is developed in order to suffice the requirements of Relation Extraction. These requirements include high precision results on nominal and prepositional phrases as they contain the entities being relevant for Relation Extraction. Furthermore, accurate shallow parsing is more desirable than deep binary parsing as it facilitates Relation Extraction more than deep parsing. Endocentric and exocentric constructions can be distinguished and improve proper phrase labeling. unsuParse is based on preferred positions of word types within phrases to detect phrase candidates. Iterating the detection of simple phrases successively induces deeper structures. The proposed algorithm fulfills all demanded criteria and achieves competitive results on standard evaluation setups. Syntactic Relation Extraction (Chapter 9) is an approach exploiting syntactic statistics and text characteristics to extract relations between previously annotated entities. The approach is based on entity distributions given in a corpus and thus, provides a possibility to extend text mining processes to new data in an unsupervised manner. Evaluation on two different languages and two different text types of the automotive domain shows that it achieves accurate results on repair order data. Results are less accurate on internet data, but the task of sentiment analysis and extraction of the opinion target can be mastered. Thus, the incorporation of internet data is possible and important as it provides useful insight into the customer\'s thoughts. To conclude, this thesis presents a complete unsupervised workflow for Relation Extraction – except for the highly domain-dependent Entity Detection task – improving performance of each of the involved subtasks compared to state-of-the-art approaches. Furthermore, this work applies Natural Language Processing methods and Relation Extraction approaches to real world data unveiling challenges that do not occur in high quality newspaper corpora. Text Mining Sprachverarbeitung Informationsextraktion Relationsextraktion POS Tagging Parsing Text Mining NLP Information Extraction Relation Extraction POS Tagging Parsing ddc:500
106	Segmentation of facade images with shape priors / Segmentation des images de façade avec à priori sur la forme Kozinski, Mateusz 30 June 2015 (has links) L'objectif de cette thèse concerne l'analyse automatique d'images de façades de bâtiments à partir de descriptions formelles à priori de formes géométriques. Ces informations suggérées par un utilisateur permettent de modéliser, de manière formelle, des contraintes spatiales plus ou moins dures quant à la segmentation sémantique produite par le système. Ceci permet de se défaire de deux principaux écueils inhérents aux méthodes d'analyse de façades existantes qui concernent d'une part la coûteuse fidélité de la segmentation résultante aux données visuelles de départ, d'autre part, la spécificité architecturale des règles imposées lors du processus de traitement. Nous proposons d'explorer au travers de cette thèse, différentes méthodes alternatives à celles proposées dans la littérature en exploitant un formalisme de représentation d'à priori de haut niveau d'abstraction, les propriétés engendrées par ces nouvelles méthodes ainsi que les outils de résolution mis en œuvres par celles-ci. Le système résultant est évalué tant quantitativement que qualitativement sur de multiples bases de données standards et par le biais d'études comparatives à des approches à l'état de l'art en la matière. Parmi nos contributions, nous pouvons citer la combinaison du formalisme des grammaires de graphes exprimant les variations architecturales de façades de bâtiments et les modèles graphiques probabilistes modélisant l'énergie attribuée à une configuration paramétrique donnée, dans un schéma d'optimisation par minimisation d'énergie; ainsi qu'une nouvelle approche par programmation linéaire d'analyse avec à priori de formes. Enfin, nous proposons un formalisme flexible de ces à priori devançant de par ses performances les méthodes à l'état de l'art tout en combinant les avantages de la généricité de contraintes simples manuellement imposées par un utilisateur, à celles de la précision de la segmentation finale qui se faisait jusqu'alors au prix d'un encodage préliminaire restrictif de règles grammaticales complexes propres à une famille architecturale donnée. Le système décrit permet également de traiter avec robustesse des scènes comprenant des objets occultants et pourrait encore être étendu notamment afin de traiter l'extension tri-dimensionnelle de la sémantisation d'environnements urbains sous forme de nuages de points 3D ou d'une analyse multi-image de bâtiments / The aim of this work is to propose a framework for facade segmentation with user-defined shape priors. In such a framework, the user specifies a shape prior using a rigorously defined shape prior formalism. The prior expresses a number of hard constraints and soft preference on spatial configuration of segments, constituting the final segmentation. Existing approaches to the problem are affected by a compromise between the type of constraints, the satisfaction of which can be guaranteed by the segmentation algorithm, and the capability to approximate optimal segmentations consistent with a prior. In this thesis we explore a number of approaches to facade parsing that combine prior formalism featuring high expressive power, guarantees of conformance of the resulting segmentations to the prior, and effective inference. We evaluate the proposed algorithms on a number of datasets. Since one of our focus points is the accuracy gain resulting from more effective inference algorithms, we perform a fair comparison to existing methods, using the same data term. Our contributions include a combination of graph grammars for expressing variation of facade structure with graphical models encoding the energy of models of given structures for different positions of facade elements. We also present the first linear formulation of facade parsing with shape priors. Finally, we propose a shape prior formalism that enables formulating the problem of optimal segmentation as the inference in a Markov random field over the standard four-connected grid of pixels. The last method advances the state of the art by combining the flexibility of a user-defined grammar with segmentation accuracy that was reserved for frameworks with pre-defined priors before. It also enables handling occlusions by simultaneously recovering the structure of the occluded facade and segmenting the occluding objects. We believe that it can be extended in many directions, including semantizing three-dimensional point clouds and parsing images of general urban scenes Modeles de bâtiment Segmentation d'images Reconstruction 3d Modeles de forme Parsing d'image Building models Image segmentation 3d reconstruction Shape priors Image parsing
107	Efficient Storage and Domain-Specific Information Discovery on Semistructured Documents Farfan, Fernando R 12 November 2009 (has links) The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model's parsing mechanism. The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents. Semistructured documents XML storage parsing information retrieval semisequental access lazy parsing ontologies Data Storage Systems Other Computer Engineering
108	Rysy z eye-trackeru v syntaktickém parsingu / Eye-tracking features in syntactic parsing Agrawal, Abhishek January 2020 (has links) In this thesis, we explore the potential benefits of leveraging eye-tracking information for dependency parsing on the English part of the Dundee corpus. To achieve this, we cast dependency parsing as a sequence labelling task and then augment the neural model for sequence labelling with eye-tracking features. We also augment a graph-based parser with eye-tracking features and parse the Dundee Corpus to corroborate our findings from the sequence labelling parser. We then experiment with a variety of parser setups ranging from parsing with all features to a delexicalized parser. Our experiments show that for a parser with all features, although the improvements are positive for the LAS score they are not significant whereas our delexicalized parser significantly outperforms the baseline we established. We also analyze the contribution of various eye-tracking features towards the different parser setups and find that eye-tracking features contain information which is complementary in nature, thus implying that augmenting the parser with various gaze features grouped together provides better performance than any individual gaze feature. 1
109	Berika receptdata med innehållshanteringssystem / Enriching Recipe Data using Content Management System Berezkin, Nikita, Heidari, Ahmed January 2019 (has links) The problem today is that people do not eat climate-smart food; this results in that the food will not suffice, and what we eat may harm the greenhouse effect. The problem is that people do not have the time or knowledge to cook climate-smart food. A solution is to use a Content Management System (CMS). A Content Management System processes selected type of data in a specific way which is then stored. This report will address the basics and the making of a CMS in a recommendation system for a user. The system will entail a more climate-smart food alternative to achieve the individual's personal needs. The result was that with the help of data from various sources, an ingredient of a recipe could add additional information such as nutritional value, allergies, and whether it is vegetarian. Tests such as performance tests on the execution time for the CMS, parsing accuracy, and matching product accuracy, a better result was achieved. Most of the ingredients in the recipe became enriched, which leads to more climate-smart food alternatives, which are better for the environment. The accuracy is the matching of ingredients in the recipe to the names of products in the business. The next step was to enrich the recipes using enriched ingredients. / Problemet i dag är att människor inte äter klimatsmart mat med resultatet att maten inte kommer räcka till i framtiden. Vad vi äter kan ha en negativ påverkan på växthuseffekten. Problemet är att människor inte har tid eller kunskap att tillaga klimatsmart mat. Detta kan lösas med hjälp av ett innehållshanteringssytem. Ett innehållshanteringsystem bearbetar vald typ av data på ett bestämt sätt som sedan lagras. Denna rapport kommer att behandla grunden och uppbyggnaden av ett innehållshanteringsystem som ska ingå i ett rekommendationssystem för en användare. Systemet ska medföra mer alternativ av klimatsmart mat för att uppnå individens personliga behov. Resultatet blev att med hjälp av data från olika källor kunde koppla samman ingredienser där information som näringsvärde, allergier samt om kosten är vegetarisk. Genom tester som prestandatest av exekveringstid för innehållshanteringsystemet, träffsäkerhet av parsning och förbättring av träffsäkerheten uppnåddes ett bättre resultat. Majoriteten av ingredienserna i receptet blev berikade vilket medför till mer klimatsmart matalternativ, vilket är bättre mot miljön. Träffsäkerheten är ingredienser i receptet som matchas mot namn av produkter i affärer. Nästa steg var att med hjälp berikade ingredienser berika recepten. CMS parsing data ingredients affix recipe CMS parsing data ingredienser affix recept. Information Systems
110	On the Extraction of Lexicalized Grammars and Parsing via Supertagging for Discontinuous Constituent Structures Ruprecht, Thomas 19 July 2024 (has links) This thesis considers constituent parsing as a form of syntactic analysis of sentences, in particular the case of disconitnuous constituent parsing. We utilize an approach, called supertagging, that utilizes formal grammars as well as neural networks to implement a parsing procedure. We define a parametrized extraction algorithm for formal grammars specifically tied to this setting, and evaluate it with three data sets for parsing in English and German. info:eu-repo/classification/ddc/004 ddc:004

Search results