• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 61
  • 50
  • 13
  • 11
  • 10
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 158
  • 41
  • 29
  • 29
  • 25
  • 23
  • 22
  • 21
  • 20
  • 18
  • 18
  • 18
  • 18
  • 17
  • 16
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
151

Design of a Robust and Flexible Grammar for Speech Control

Ludyga, Tomasz 28 May 2024 (has links)
Voice interaction is an established automatization and accessibility feature. While many satisfactory speech recognition solutions are available today, the interpretation of text se-mantic is in some use-cases difficult. Differentiated can be two types of text semantic ex-traction models: probabilistic and pure rule-based. Rule-based reasoning is formalizable into grammars and enables fast language validation, transparent decision-making and easy customization. In this thesis we develop a context-free ANTLR semantic grammar to control software by speech in a medical, smart glasses related, domain. The implementation is preceded by research of state-of-the-art, requirements consultation and a thorough design of reusable system abstractions. Design includes definitions of DSL, meta grammar, generic system ar-chitecture and tool support. Additionally, we investigate trivial and experimental grammar improvement techniques. Due to multifaceted flexibility and robustness of the designed framework, we indicate its usability in critical and adaptive systems. We determine 75% semantic recognition accuracy in the medical main use-case. We compare it against se-mantic extraction using SpaCy and two fine-tuned AI classifiers. The evaluation reveals high accuracy of BERT for sequence classification and big potential of hybrid solutions with AI techniques on top grammars, essentially for detection of alerts. The accuracy is strong dependent on input quality, highlighting the importance of speech recognition tailored to specific vocabulary.:1 Introduction 1 1.1 Motivation 1 1.2 CAIS.ME Project 2 1.3 Problem Statement 2 1.4 Thesis Overview 3 2 Related Work 4 3 Foundational Concepts and Systems 6 3.1 Human-Computer Interaction in Speech 6 3.2 Speech Recognition 7 3.2.1 Open-source technologies 8 3.2.2 Other technologies 9 3.3 Language Recognition 9 3.3.1 Regular expressions 10 3.3.2 Lexical tokenization 10 3.3.3 Parsing 10 3.3.4 Domain Specific Languages 11 3.3.5 Formal grammars 11 3.3.6 Natural Language Processing 12 3.3.7 Model-Driven Engineering 14 4 State-of-the-Art: Grammars 15 4.1 Overview 15 4.2 Workbenches for Grammar Design 16 4.2.1 ANTLR 16 4.2.2 Xtext 17 4.2.3 JetBrains MPS 17 4.2.4 Other tools 18 4.3 Design Approaches 19 5 Problem Analysis 23 5.1 Methodology 23 5.2 Identification of Use-Cases 24 5.3 Requirements Analysis 26 5.3.1 Functional requirements 26 5.3.2 Qualitative requirements 26 5.3.3 Acceptance criteria 27 6 Design 29 6.1 Preprocessing 29 6.2 Underlying Domain Specific Modelling 31 6.2.1 Language model definition 31 6.2.2 Formalization 32 6.2.3 Constraints 32 6.3 Generic Grammar Syntax 33 6.4 Architecture 36 6.5 Integration of AI Techniques 38 6.6 Grammar Improvement 40 6.6.1 Identification of synonyms 40 6.6.2 Automatic addition of synonyms 42 6.6.3 Addition of same-meaning strings 42 6.6.4 Addition and modification of rules 43 6.7 Processing of unrecognized input 44 6.8 Summary 45 7 Implementation and Evaluation 47 7.1 Development Environment 47 7.2 Implementation 48 7.2.1 Grammar model transformation 48 7.2.2 Output construction 50 7.2.3 Testing 50 7.2.4 Reusability for similar use-cases 51 7.3 Limitations and Challenges 52 7.4 Comparison to NLP Solutions 54 8 Conclusion 58 8.1 Summary of Findings 58 8.2 Future Research and Development 60 Acronyms 62 Bibliography 63 List of Figures 73 List of Tables 74 List of Listings 75
152

Multiple sequence analysis in the presence of alignment uncertainty

Herman, Joseph L. January 2014 (has links)
Sequence alignment is one of the most intensely studied problems in bioinformatics, and is an important step in a wide range of analyses. An issue that has gained much attention in recent years is the fact that downstream analyses are often highly sensitive to the specific choice of alignment. One way to address this is to jointly sample alignments along with other parameters of interest. In order to extend the range of applicability of this approach, the first chapter of this thesis introduces a probabilistic evolutionary model for protein structures on a phylogenetic tree; since protein structures typically diverge much more slowly than sequences, this allows for more reliable detection of remote homologies, improving the accuracy of the resulting alignments and trees, and reducing sensitivity of the results to the choice of dataset. In order to carry out inference under such a model, a number of new Markov chain Monte Carlo approaches are developed, allowing for more efficient convergence and mixing on the high-dimensional parameter space. The second part of the thesis presents a directed acyclic graph (DAG)-based approach for representing a collection of sampled alignments. This DAG representation allows the initial collection of samples to be used to generate a larger set of alignments under the same approximate distribution, enabling posterior alignment probabilities to be estimated reliably from a reasonable number of samples. If desired, summary alignments can then be generated as maximum-weight paths through the DAG, under various types of loss or scoring functions. The acyclic nature of the graph also permits various other types of algorithms to be easily adapted to operate on the entire set of alignments in the DAG. In the final part of this work, methodology is introduced for alignment-DAG-based sequence annotation using hidden Markov models, and RNA secondary structure prediction using stochastic context-free grammars. Results on test datasets indicate that the additional information contained within the DAG allows for improved predictions, resulting in substantial gains over simply analysing a set of alignments one by one.
153

Modelling syntactic gradience with loose constraint-based parsing: Modélisation de la gradience syntaxique par analyse relâchée à base de contraintes / Modélisation de la gradience syntaxique par analyse relâchée à base de contraintes

Prost, Jean-Philippe January 2008 (has links)
Thesis submitted for the joint institutional requirements for the double-badged degree of Doctor of Philosophy and Docteur de l'Université de Provence, Spécialité : Informatique. / Thesis (PhD)--Macquarie University, Division of Information and Communication Sciences, Department of Computing, 2008. / Includes bibliography (p. 229-240) and index. / Introduction -- Background -- A model-theoretic framework for PG -- Loose constraint-based parsing -- A computational model for gradience -- Conclusion. / The grammaticality of a sentence has conventionally been treated in a binary way: either a sentence is grammatical or not. A growing body of work, however, focuses on studying intermediate levels of acceptability, sometimes referred to as gradience. To date, the bulk of this work has concerned itself with the exploration of human assessments of syntactic gradience. This dissertation explores the possibility to build a robust computational model that accords with these human judgements. -- We suggest that the concepts of Intersective Gradience and Subsective Gradience introduced by Aarts for modelling graded judgements be extended to cover deviant language. Under such a new model, the problem then raised by gradience is to classify an utterance as a member of a specific category according to its syntactic characteristics. More specifically, we extend Intersective Gradience (IG) so that it is concerned with choosing the most suitable syntactic structure for an utterance among a set of candidates, while Subsective Gradience (SG) is extended to be concerned with calculating to what extent the chosen syntactic structure is typical from the category at stake. IG is addressed in relying on a criterion of optimality, while SG is addressed in rating an utterance according to its grammatical acceptability. As for the required syntactic characteristics, which serve as features for classifying an utterance, our investigation of different frameworks for representing the syntax of natural language shows that they can easily be represented in Model-Theoretic Syntax; we choose to use Property Grammars (PG), which offers to model the characterisation of an utterance. We present here a fully automated solution for modelling syntactic gradience, which characterises any well formed or ill formed input sentence, generates an optimal parse for it, then rates the utterance according to its grammatical acceptability. -- Through the development of such a new model of gradience, the main contribution of this work is three-fold. -- First, we specify a model-theoretic logical framework for PG, which bridges the gap observed in the existing formalisation regarding the constraint satisfaction and constraint relaxation mechanisms, and how they relate to the projection of a category during the parsing process. This new framework introduces the notion of loose satisfaction, along with a formulation in first-order logic, which enables reasoning about the characterisation of an utterance. -- Second, we present our implementation of Loose Satisfaction Chart Parsing (LSCP), a dynamic programming approach based on the above mechanisms, which is proven to always find the full parse of optimal merit. Although it shows a high theoretical worst time complexity, it performs sufficiently well with the help of heuristics to let us experiment with our model of gradience. -- And third, after postulating that human acceptability judgements can be predicted by factors derivable from LSCP, we present a numeric model for rating an utterance according to its syntactic gradience. We measure a good correlation with grammatical acceptability by human judgements. Moreover, the model turns out to outperform an existing one discussed in the literature, which was experimented with parses generated manually. / Mode of access: World Wide Web. / xxviii, 283 p. ill
154

Jazyk barokních kazatelů Bílovského a de Waldta / The language of two Baroque preachers, Bílovský and de Waldt

BUTULOVÁ, Klára January 2010 (has links)
The overall theme of this diploma thesis are homilies entitled to Saint Anna by two baroque homilists, who origin from various language backgrounds. This thesis is divided into two major sections. The first section includes general information about the important baroque preachers {--} with a focus on Bílovský and de Waldt. It also provides valuable information about the contemporary grammars. The first section of this thesis is primarily based on a proven literature theory. The second section of this thesis includes practical analysis of three specific homilies on the phonological and morphological levels. The analysis is based on the theme, style, and lexicon pages of content as well as the confrontation of their means of expression. The objective of this thesis is a comparison of the languages, shown by both preachers through phonology, morphology, and the judgment of a level to which spoken language extends into their language.
155

Tradução automática estatística baseada em sintaxe e linguagens de árvores

Beck, Daniel Emilio 19 June 2012 (has links)
Made available in DSpace on 2016-06-02T19:05:58Z (GMT). No. of bitstreams: 1 4541.pdf: 1339407 bytes, checksum: be0e2f3bb86e7d6b4c8d03f4f20214ef (MD5) Previous issue date: 2012-06-19 / Universidade Federal de Minas Gerais / Machine Translation (MT) is one of the classic Natural Language Processing (NLP) applications. The state-of-the-art in MT is represented by statistical methods that aim to learn all necessary linguistic knowledge automatically through large collections of texts (corpora). However, while the quality of statistical MT systems had improved, nowadays these advances are not significant. For this reason, research in the area have sought to involve more explicit linguistic knowledge in these systems. One issue that purely statistical MT systems have is the lack of correct treatment of syntactic phenomena. Thus, one of the research directions when trying to incorporate linguistic knowledge in those systems is through the addition of syntactic rules. To accomplish this, many methods and formalisms with this goal in mind are studied. This text presents the investigation of methods which aim to advance the state-of-the-art in statistical MT through models that consider syntactic information. The methods and formalisms studied are those used to deal with tree languages, mainly Tree Substitution Grammars (TSGs) and Tree-to-String (TTS) Transducers. From this work, a greater understanding was obtained about the studied formalisms and their behavior when used in NLP applications. / A Tradução Automática (Machine Translation - MT) é uma das aplicações clássicas dentro do Processamento da Língua Natural (Natural Language Processing - NLP). O estado-da-arte em MT é representado por métodos estatísticos, que buscam aprender o conhecimento linguístico necessário de forma automática por meio de grandes coleções de textos (os corpora). Entretanto, ainda que se tenha avançado bastante em relação à qualidade de sistemas estatísticos de MT, hoje em dia esses avanços não estão sendo significativos. Por conta disso, as pesquisas na área têm buscado formas de envolver mais conhecimento linguístico explícito nesses sistemas. Um dos problemas que não é bem resolvido por sistemas de MT puramente estatísticos é o correto tratamento de fenômenos sintáticos. Assim, uma das direções que as pesquisas tomam na hora de incorporar conhecimento linguístico a esses sistemas é através da adição de regras sintáticas. Para isso, uma série de métodos e formalismos foram e são estudados até hoje. Esse texto apresenta a investigação de métodos que se utilizam de informação sintática na tentativa de avançar no estado-da-arte da MT estatística. Foram utilizados métodos e formalismos que lidam com linguagens de a´rvores, em especial as Gramáticas de Substituição de Árvores (Tree Substitution Grammars - TSGs) e os Transdutores Árvore-para-String (Tree-to-String - TTS). Desta investigação, obteve-se maior entendimento sobre os formalismos estudados e seu comportamento em aplicações de NLP.
156

Rekonfigurovatelná analýza strojového kódu / Retargetable Analysis of Machine Code

Křoustek, Jakub Unknown Date (has links)
Analýza softwaru je metodologie, jejímž účelem je analyzovat chování daného programu. Jednotlivé metody této analýzy je možné využít i v dalších oborech, jako je zpětné inženýrství, migrace kódu apod. V této práci se zaměříme na analýzu strojového kódu, na zjištění nedostatků existujících metod a na návrh metod nových, které umožní rychlou a přesnou rekonfigurovatelnou analýzu kódu (tj. budou nezávislé na konkrétní cílové platformě). Zkoumány budou dva typy analýz - dynamická (tj. analýza za běhu aplikace) a statická (tj. analýza aplikace bez jejího spuštění). Přínos této práce v rámci dynamické analýzy je realizován jako rekonfigurovatelný ladicí nástroj a dále jako dva typy tzv. rekonfigurovatelného translátovaného simulátoru. Přínos v rámci statické analýzy spočívá v navržení a implementování rekonfigurovatelného zpětného překladače, který slouží pro transformaci strojového kódu zpět do vysokoúrovňové reprezentace. Všechny tyto nástroje jsou založeny na nových metodách navržených autorem této práce. Na základě experimentálních výsledků a ohlasů od uživatelů je možné usuzovat, že tyto nástroje jsou plně srovnatelné s existujícími (komerčními) nástroji a nezřídka dosahují i lepších výsledků.
157

Génération automatique de phrases pour l'apprentissage des langues / Natural language generation for language learning

Perez, Laura Haide 19 April 2013 (has links)
Dans ces travaux, nous explorons comment les techniques de Générations Automatiques de Langue Naturelle (GLN) peuvent être utilisées pour aborder la tâche de génération (semi-)automatique de matériel et d'activités dans le contexte de l'apprentissage de langues assisté par ordinateur. En particulier, nous montrons comment un Réalisateur de Surface (RS) basé sur une grammaire peut être exploité pour la création automatique d'exercices de grammaire. Notre réalisateur de surface utilise une grammaire réversible étendue, à savoir SemTAG, qui est une Grammaire d'Arbre Adjoints à Structure de Traits (FB-TAG) couplée avec une sémantique compositionnelle basée sur l'unification. Plus précisément, la grammaire FB-TAG intègre une représentation plate et sous-spécifiée des formules de Logique de Premier Ordre (FOL). Dans la première partie de la thèse, nous étudions la tâche de réalisation de surface à partir de formules sémantiques plates et nous proposons un algorithme de réalisation de surface basé sur la grammaire FB-TAG optimisé, qui supporte la génération de phrases longues étant donné une grammaire et un lexique à large couverture. L'approche suivie pour l'optimisation de la réalisation de surface basée sur FB-TAG à partir de sémantiques plates repose sur le fait qu'une grammaire FB-TAG peut être traduite en une Grammaire d'Arbres Réguliers à Structure de Traits (FB-RTG) décrivant ses arbres de dérivation. Le langage d'arbres de dérivation de la grammaire TAG constitue un langage plus simple que le langage d'arbres dérivés, c'est pourquoi des approches de génération basées sur les arbres de dérivation ont déjà été proposées. Notre approche se distingue des précédentes par le fait que notre encodage FB-RTG prend en compte les structures de traits présentes dans la grammaire FB-TAG originelle, ayant de ce fait des conséquences importantes par rapport à la sur-génération et la préservation de l'interface syntaxe-sémantique. L'algorithme de génération d'arbres de dérivation que nous proposons est un algorithme de type Earley intégrant un ensemble de techniques d'optimisation bien connues: tabulation, partage-compression (sharing-packing) et indexation basée sur la sémantique. Dans la seconde partie de la thèse, nous explorons comment notre réalisateur de surface basé sur SemTAG peut être utilisé pour la génération (semi-)automatique d'exercices de grammaire. Habituellement, les enseignants éditent manuellement les exercices et leurs solutions et les classent au regard de leur degré de difficulté ou du niveau attendu de l'apprenant. Un courant de recherche dans le Traitement Automatique des Langues (TAL) pour l'apprentissage des langues assisté par ordinateur traite de la génération (semi-)automatique d'exercices. Principalement, ces travaux s'appuient sur des textes extraits du Web, utilisent des techniques d'apprentissage automatique et des techniques d'analyse de textes (par exemple, analyse de phrases, POS tagging, etc.). Ces approches confrontent l'apprenant à des phrases qui ont des syntaxes potentiellement complexes et du vocabulaire varié. En revanche, l'approche que nous proposons dans cette thèse aborde la génération (semi-)automatique d'exercices du type rencontré dans les manuels pour l'apprentissage des langues. Il s'agit, en d'autres termes, d'exercices dont la syntaxe et le vocabulaire sont faits sur mesure pour des objectifs pédagogiques et des sujets donnés. Les approches de génération basées sur des grammaires associent les phrases du langage naturel avec une représentation linguistique fine de leur propriété morpho-syntaxiques et de leur sémantique grâce à quoi il est possible de définir un langage de contraintes syntaxiques et morpho-syntaxiques permettant la sélection de phrases souches en accord avec un objectif pédagogique donné. Cette représentation permet en outre d'opérer un post-traitement des phrases sélectionées pour construire des exercices de grammaire / In this work, we explore how Natural Language Generation (NLG) techniques can be used to address the task of (semi-)automatically generating language learning material and activities in Camputer-Assisted Language Learning (CALL). In particular, we show how a grammar-based Surface Realiser (SR) can be usefully exploited for the automatic creation of grammar exercises. Our surface realiser uses a wide-coverage reversible grammar namely SemTAG, which is a Feature-Based Tree Adjoining Grammar (FB-TAG) equipped with a unification-based compositional semantics. More precisely, the FB-TAG grammar integrates a flat and underspecified representation of First Order Logic (FOL) formulae. In the first part of the thesis, we study the task of surface realisation from flat semantic formulae and we propose an optimised FB-TAG-based realisation algorithm that supports the generation of longer sentences given a large scale grammar and lexicon. The approach followed to optimise TAG-based surface realisation from flat semantics draws on the fact that an FB-TAG can be translated into a Feature-Based Regular Tree Grammar (FB-RTG) describing its derivation trees. The derivation tree language of TAG constitutes a simpler language than the derived tree language, and thus, generation approaches based on derivation trees have been already proposed. Our approach departs from previous ones in that our FB-RTG encoding accounts for feature structures present in the original FB-TAG having thus important consequences regarding over-generation and preservation of the syntax-semantics interface. The concrete derivation tree generation algorithm that we propose is an Earley-style algorithm integrating a set of well-known optimisation techniques: tabulation, sharing-packing, and semantic-based indexing. In the second part of the thesis, we explore how our SemTAG-based surface realiser can be put to work for the (semi-)automatic generation of grammar exercises. Usually, teachers manually edit exercises and their solutions, and classify them according to the degree of dificulty or expected learner level. A strand of research in (Natural Language Processing (NLP) for CALL addresses the (semi-)automatic generation of exercises. Mostly, this work draws on texts extracted from the Web, use machine learning and text analysis techniques (e.g. parsing, POS tagging, etc.). These approaches expose the learner to sentences that have a potentially complex syntax and diverse vocabulary. In contrast, the approach we propose in this thesis addresses the (semi-)automatic generation of grammar exercises of the type found in grammar textbooks. In other words, it deals with the generation of exercises whose syntax and vocabulary are tailored to specific pedagogical goals and topics. Because the grammar-based generation approach associates natural language sentences with a rich linguistic description, it permits defining a syntactic and morpho-syntactic constraints specification language for the selection of stem sentences in compliance with a given pedagogical goal. Further, it allows for the post processing of the generated stem sentences to build grammar exercise items. We show how Fill-in-the-blank, Shuffle and Reformulation grammar exercises can be automatically produced. The approach has been integrated in the Interactive French Learning Game (I-FLEG) serious game for learning French and has been evaluated both based in the interactions with online players and in collaboration with a language teacher
158

Le participe dans les grammaires des langues romanes (XVe-XVIIIe siècles). Histoire comparée d'une classe grammaticale / The participle in the Romance Languages' grammars (15th-18th centuries). A compared history of a grammar category / El participio en las gramáticas de las lenguas románicas (siglos XV-XVIII). Historia comparada de una clase gramatical

Diaz Villalba, Alejandro 13 September 2017 (has links)
L’étude présente l’histoire de la classe du participe à travers un corpus de grammaires del’espagnol, du français, de l’italien et du portugais parues entre le XVe et le XVIIIe siècle.La démarche comparative s’appuie sur le principe méthodologique de la mise en série d’une centaine d’ouvrages regroupés et confrontés selon des paramètres variables : la chronologie, le thème ou la tradition grammaticale de la langue-objet.La première partie aborde la question de la catégorisation en linguistique et s’interroge sur la nature des formes non finies du verbe, tout particulièrement du participe et de son emploi dans les formes verbales analytiques. La deuxième partie traite de l’histoire du participe sous un angle général. Ainsi, après avoir donné un aperçu des aspects problématiques qui intéressent les grammairiens grecs et latins, l’analyse se centre sur le traitement de la classe dans les grammaires des langues romanes. La troisième partie s’attache à étudier les approches et les concepts dont se servent les grammairiens de la Renaissance pour traiter les temps composés ainsi que la façon dont ils décrivent et (re)catégorisent les formes participiales de ces temps verbaux. / The study investigates the history of the word-class of participle through a close study of a corpus of French, Spanish, Portuguese and Italian grammars which were published between the 15th and 18th centuries. The comparative approach is based on the methodological principle of “series of texts”, by grouping and collating a hundred works according to several variable parameters: the chronology, the theme or the grammatical tradition of the language in question.The first part of the study deals with the linguistic categorization and questions the nature of the non-finite verbal forms, especially the participle and its use in an analytical verbal form. The second part deals with the history of the participle from a more general point of view. Thus, after an overview of the problematic aspects of Greek and Latin grammarians, the analysis focuses on the treatment of the word-class in the grammars of the Romance languages. The third part focuses on the approaches and concepts used by the Renaissance grammarians to deal with compound tenses and on how they described and (re)-categorized the participle forms of these verbal tenses. / El estudio presenta la historia de la clase del participio a través de un corpus de gramáticas de español, francés, italiano y portugués publicadas entre los siglos XV y XVIII. El enfoque comparativo se basa en el principio metodológico de la constitución de series textuales, que nos permite agrupar y cotejar un centenar de textos en función de parámetros variables: la cronología, el tema o la tradición gramatical de la lengua objeto.La primera parte aborda el asunto de la categorización en lingüística e indaga la naturaleza de las formas no finitas del verbo, especialmente la del participio y su utilización en las formas analíticas del verbo. La segunda parte propone una aproximacióna la historia del participio desde una perspectiva más general. Así pues, tras una cala en los aspectos problemáticos relacionados con el participio tratados por los gramáticos griegos y latinos, el análisis se centra en el tratamiento de la clase en las gramáticas de las lenguas romances. La tercera parte investiga sobre los enfoques y los conceptos que emplean los gramáticos del Renacimiento para tratar los tiempos compuestos, y sobre el modo en que describen y (re)categorizan las formas participiales de esos tiempos verbales.

Page generated in 0.0858 seconds