• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 11
  • 2
  • 1
  • 1
  • Tagged with
  • 22
  • 22
  • 10
  • 10
  • 9
  • 8
  • 8
  • 7
  • 7
  • 6
  • 6
  • 6
  • 6
  • 6
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Simulationssprachen - Effiziente Entwicklung und Ausführung

Blunk, Andreas 21 January 2019 (has links)
Simulationssprachen sind in Bezug auf die Unterstützung neuer domänenspezifischer Konzepte mit einer dem Problem entsprechenden prägnanten Darstellung nicht flexibel erweiterbar. Dies betrifft sowohl die Sprache in ihren Konzepten als auch die Unterstützung der Sprache durch Sprachwerkzeuge. In dieser Arbeit entsteht der neue Sprachentwicklungsansatz Discrete-Event Modelling with Extensibility (DMX) für die Entwicklung flexibel erweiterbarer Simulationssprachen für domänenspezifische Anwendungsfelder, der eine effiziente Entwicklung der Sprache und eine effiziente Ausführung von Modellen erlaubt. Der Fokus der Arbeit liegt auf der zeitdiskreten ereignisbasierten Simulation und einer prozessorientierten Beschreibung von Simulationsmodellen. Der Ansatz unterscheidet Basiskonzepte, die zur Basissprache gehören, und Erweiterungskonzepte, die Teil von Erweiterungsdefinitionen sind. Es wird untersucht, welche Basiskonzepte eine Simulationssprache bereitstellen muss, so dass eine laufzeiteffiziente Ausführung von prozessorientierten Modellen möglich ist. Die hohe Laufzeiteffizienz der Ausführung wird durch die Konzeption einer neuartigen Methode zur Abbildung von Prozesskontextwechseln auf ein C++-Programm gezeigt. Der Spracherweiterungsansatz ist nicht auf Simulationssprachen als Basissprachen beschränkt und wird daher allgemein beschrieben. Der Ansatz basiert auf einer Syntaxerweiterung einer Basissprache, die mit einem Metamodell und einer kontextfreien Grammatik definiert ist. Die Ausführung von Erweiterungskonzepten wird durch eine Konzeptreduktion auf Basiskonzepte erreicht. Der Ansatz stellt bestimmte Voraussetzungen an eine Basissprache und erlaubt bestimmte Arten von Erweiterungen, die in der Arbeit untersucht werden. Die Eignung des Anstatzes zur Entwicklung einer komplexen domänenspezifischen Simulationssprache wird an einer Sprache für Zustandsautomaten gezeigt. / Simulation languages are not extensible regarding the support of new domain-specific concepts with a concise representation. This includes the concepts of a language as well as the language tools. In this dissertation, the new approach Discrete-Event Modelling with Extensibility (DMX) is developed. DMX allows to create flexible domain-specific simulation languages by defining extensions to a base language. The approach allows to develop these languages efficiently and also to execute simulation models in a runtime efficient way. The focus of this dissertation is on process-oriented descriptions of discrete-event simulation models. The approach distinguishes base concepts which are part of the base language and extension concepts which are part of extension definitions. The dissertation investigates the necessary base concepts which should be included in a base simulation language in order to execute process-oriented models efficiently. The high runtime efficiency of executions is achieved by creating a new method for mapping process context switches to a program in C++. The runtime efficiency can be transferred to extension concepts as well. The extension approach is described in a general way because it is not limited to a simulation language as a base language. The approach is based on the syntax extension of a base language, which is defined by a metamodel and a context-free grammar. The execution of extension concepts is achieved by concept reduction to base concepts. The approach has a number of requirements to the base language and allows certain kinds of extensions, which are desribed in the dissertation. The possibility to define a complex domain-specific simulation language is shown by applying the approach to the development of a state machine language.
12

Generating rhyming poetry using LSTM recurrent neural networks

Peterson, Cole 30 April 2019 (has links)
Current approaches to generating rhyming English poetry with a neural network involve constraining output to enforce the condition of rhyme. We investigate whether this approach is necessary, or if recurrent neural networks can learn rhyme patterns on their own. We compile a new dataset of amateur poetry which allows rhyme to be learned without external constraints because of the dataset’s size and high frequency of rhymes. We then evaluate models trained on the new dataset using a novel framework that automatically measures the system’s knowledge of poetic form and generalizability. We find that our trained model is able to generalize the pattern of rhyme, generate rhymes unseen in the training data, and also that the learned word embeddings for rhyming sets of words are linearly separable. Our model generates a couplet which rhymes 68.15% of the time; this is the first time that a recurrent neural network has been shown to generate rhyming poetry a high percentage of the time. Additionally, we show that crowd-source workers can only distinguish between our generated couplets and couplets from our dataset 63.3% of the time, indicating that our model generates poetry with coherency, semantic meaning, and fluency comparable to couplets written by humans. / Graduate
13

Incremental generative models for syntactic and semantic natural language processing

Buys, Jan Moolman January 2017 (has links)
This thesis investigates the role of linguistically-motivated generative models of syntax and semantic structure in natural language processing (NLP). Syntactic well-formedness is crucial in language generation, but most statistical models do not account for the hierarchical structure of sentences. Many applications exhibiting natural language understanding rely on structured semantic representations to enable querying, inference and reasoning. Yet most semantic parsers produce domain-specific or inadequately expressive representations. We propose a series of generative transition-based models for dependency syntax which can be applied as both parsers and language models while being amenable to supervised or unsupervised learning. Two models are based on Markov assumptions commonly made in NLP: The first is a Bayesian model with hierarchical smoothing, the second is parameterised by feed-forward neural networks. The Bayesian model enables careful analysis of the structure of the conditioning contexts required for generative parsers, but the neural network is more accurate. As a language model the syntactic neural model outperforms both the Bayesian model and n-gram neural networks, pointing to the complementary nature of distributed and structured representations for syntactic prediction. We propose approximate inference methods based on particle filtering. The third model is parameterised by recurrent neural networks (RNNs), dropping the Markov assumptions. Exact inference with dynamic programming is made tractable here by simplifying the structure of the conditioning contexts. We then shift the focus to semantics and propose models for parsing sentences to labelled semantic graphs. We introduce a transition-based parser which incrementally predicts graph nodes (predicates) and edges (arguments). This approach is contrasted against predicting top-down graph traversals. RNNs and pointer networks are key components in approaching graph parsing as an incremental prediction problem. The RNN architecture is augmented to condition the model explicitly on the transition system configuration. We develop a robust parser for Minimal Recursion Semantics, a linguistically-expressive framework for compositional semantics which has previously been parsed only with grammar-based approaches. Our parser is much faster than the grammar-based model, while the same approach improves the accuracy of neural Abstract Meaning Representation parsing.
14

Recurrent neural network language models for automatic speech recognition

Gangireddy, Siva Reddy January 2017 (has links)
The goal of this thesis is to advance the use of recurrent neural network language models (RNNLMs) for large vocabulary continuous speech recognition (LVCSR). RNNLMs are currently state-of-the-art and shown to consistently reduce the word error rates (WERs) of LVCSR tasks when compared to other language models. In this thesis we propose various advances to RNNLMs. The advances are: improved learning procedures for RNNLMs, enhancing the context, and adaptation of RNNLMs. We learned better parameters by a novel pre-training approach and enhanced the context using prosody and syntactic features. We present a pre-training method for RNNLMs, in which the output weights of a feed-forward neural network language model (NNLM) are shared with the RNNLM. This is accomplished by first fine-tuning the weights of the NNLM, which are then used to initialise the output weights of an RNNLM with the same number of hidden units. To investigate the effectiveness of the proposed pre-training method, we have carried out text-based experiments on the Penn Treebank Wall Street Journal data, and ASR experiments on the TED lectures data. Across the experiments, we observe small but significant improvements in perplexity (PPL) and ASR WER. Next, we present unsupervised adaptation of RNNLMs. We adapted the RNNLMs to a target domain (topic or genre or television programme (show)) at test time using ASR transcripts from first pass recognition. We investigated two approaches to adapt the RNNLMs. In the first approach the forward propagating hidden activations are scaled - learning hidden unit contributions (LHUC). In the second approach we adapt all parameters of RNNLM.We evaluated the adapted RNNLMs by showing the WERs on multi genre broadcast speech data. We observe small (on an average 0.1% absolute) but significant improvements in WER compared to a strong unadapted RNNLM model. Finally, we present the context-enhancement of RNNLMs using prosody and syntactic features. The prosody features were computed from the acoustics of the context words and the syntactic features were from the surface form of the words in the context. We trained the RNNLMs with word duration, pause duration, final phone duration, syllable duration, syllable F0, part-of-speech tag and Combinatory Categorial Grammar (CCG) supertag features. The proposed context-enhanced RNNLMs were evaluated by reporting PPL and WER on two speech recognition tasks, Switchboard and TED lectures. We observed substantial improvements in PPL (5% to 15% relative) and small but significant improvements in WER (0.1% to 0.5% absolute).
15

Des modèles de langage pour la reconnaissance de l'écriture manuscrite / Language Modelling for Handwriting Recognition

Swaileh, Wassim 04 October 2017 (has links)
Cette thèse porte sur le développement d'une chaîne de traitement complète pour réaliser des tâches de reconnaissance d'écriture manuscrite non contrainte. Trois difficultés majeures sont à résoudre: l'étape du prétraitement, l'étape de la modélisation optique et l'étape de la modélisation du langage. Au stade des prétraitements il faut extraire correctement les lignes de texte à partir de l'image du document. Une méthode de segmentation itérative en lignes utilisant des filtres orientables a été développée à cette fin. La difficulté dans l’étape de la modélisation optique vient de la diversité stylistique des scripts d'écriture manuscrite. Les modèles optiques statistiques développés sont des modèles de Markov cachés (HMM-GMM) et les modèles de réseaux de neurones récurrents (BLSTM-CTC). Les réseaux récurrents permettent d’atteindre les performances de l’état de l’art sur les deux bases de référence RIMES (pour le Français) et IAM (pour l’anglais). L'étape de modélisation du langage implique l'intégration d’un lexique et d’un modèle de langage statistique afin de rechercher parmi les hypothèses proposées par le modèle optique, la séquence de mots (phrase) la plus probable du point de vue linguistique. La difficulté à ce stade est liée à l’obtention d’un modèle de couverture lexicale optimale avec un minimum de mots hors vocabulaire (OOV). Pour cela nous introduisons une modélisation en sous-unités lexicales composée soit de syllabes soit de multigrammes. Ces modèles couvrent efficacement une partie importante des mots hors vocabulaire. Les performances du système de reconnaissance avec les unités sous-lexicales dépassent les performances des systèmes de reconnaissance traditionnelles de mots ou de caractères en présence d’un fort taux de mots hors lexique. Elles sont équivalentes aux modèles traditionnels en présence d’un faible taux de mots hors lexique. Grâce à la taille compacte du modèle de langage reposant sur des unités sous-lexicales, un système de reconnaissance multilingue unifié a été réalisé. Le système multilingue unifié améliore les performances de reconnaissance par rapport aux systèmes spécialisés dans chaque langue, notamment lorsque le modèle optique unifié est utilisé. / This thesis is about the design of a complete processing chain dedicated to unconstrained handwriting recognition. Three main difficulties are adressed: pre-processing, optical modeling and language modeling. The pre-processing stage is related to extracting properly the text lines to be recognized from the document image. An iterative text line segmentation method using oriented steerable filters was developed for this purpose. The difficulty in the optical modeling stage lies in style diversity of the handwriting scripts. Statistical optical models are traditionally used to tackle this problem such as Hidden Markov models (HMM-GMM) and more recently recurrent neural networks (BLSTM-CTC). Using BLSTM we achieve state of the art performance on the RIMES (for French) and IAM (for English) datasets. The language modeling stage implies the integration of a lexicon and a statistical language model to the recognition processing chain in order to constrain the recognition hypotheses to the most probable sequence of words (sentence) from the language point of view. The difficulty at this stage is related to the finding the optimal vocabulary with minimum Out-Of-Vocabulary words rate (OOV). Enhanced language modeling approaches has been introduced by using sub-lexical units made of syllables or multigrams. The sub-lexical units cover an important portion of the OOV words. Then the language coverage depends on the domain of the language model training corpus, thus the need to train the language model with in domain data. The recognition system performance with the sub-lexical units outperformes the traditional recognition systems that use words or characters language models, in case of high OOV rates. Otherwise equivalent performances are obtained with a compact sub-lexical language model. Thanks to the compact lexicon size of the sub-lexical units, a unified multilingual recognition system has been designed. The unified system performance have been evaluated on the RIMES and IAM datasets. The unified multilingual system shows enhanced recognition performance over the specialized systems, especially when a unified optical model is used.
16

Translation Alignment Applied to Historical Languages: methods, evaluation, applications, and visualization

Yousef, Tariq 17 July 2023 (has links)
Translation alignment is an essential task in Digital Humanities and Natural Language Processing, and it aims to link words/phrases in the source text with their translation equivalents in the translation. In addition to its importance in teaching and learning historical languages, translation alignment builds bridges between ancient and modern languages through which various linguistics annotations can be transferred. This thesis focuses on word-level translation alignment applied to historical languages in general and Ancient Greek and Latin in particular. As the title indicates, the thesis addresses four interdisciplinary aspects of translation alignment. The starting point was developing Ugarit, an interactive annotation tool to perform manual alignment aiming to gather training data to train an automatic alignment model. This effort resulted in more than 190k accurate translation pairs that I used for supervised training later. Ugarit has been used by many researchers and scholars also in the classroom at several institutions for teaching and learning ancient languages, which resulted in a large, diverse crowd-sourced aligned parallel corpus allowing us to conduct experiments and qualitative analysis to detect recurring patterns in annotators’ alignment practice and the generated translation pairs. Further, I employed the recent advances in NLP and language modeling to develop an automatic alignment model for historical low-resourced languages, experimenting with various training objectives and proposing a training strategy for historical languages that combines supervised and unsupervised training with mono- and multilingual texts. Then, I integrated this alignment model into other development workflows to project cross-lingual annotations and induce bilingual dictionaries from parallel corpora. Evaluation is essential to assess the quality of any model. To ensure employing the best practice, I reviewed the current evaluation procedure, defined its limitations, and proposed two new evaluation metrics. Moreover, I introduced a visual analytics framework to explore and inspect alignment gold standard datasets and support quantitative and qualitative evaluation of translation alignment models. Besides, I designed and implemented visual analytics tools and reading environments for parallel texts and proposed various visualization approaches to support different alignment-related tasks employing the latest advances in information visualization and best practice. Overall, this thesis presents a comprehensive study that includes manual and automatic alignment techniques, evaluation methods and visual analytics tools that aim to advance the field of translation alignment for historical languages.
17

Word2vec modely s přidanou kontextovou informací / Word2vec Models with Added Context Information

Šůstek, Martin January 2017 (has links)
This thesis is concerned with the explanation of the word2vec models. Even though word2vec was introduced recently (2013), many researchers have already tried to extend, understand or at least use the model because it provides surprisingly rich semantic information. This information is encoded in N-dim vector representation and can be recall by performing some operations over the algebra. As an addition, I suggest a model modifications in order to obtain different word representation. To achieve that, I use public picture datasets. This thesis also includes parts dedicated to word2vec extension based on convolution neural network.
18

Automatic taxonomy evaluation

Gao, Tianjian 12 1900 (has links)
This thesis would not be made possible without the generous support of IATA. / Les taxonomies sont une représentation essentielle des connaissances, jouant un rôle central dans de nombreuses applications riches en connaissances. Malgré cela, leur construction est laborieuse que ce soit manuellement ou automatiquement, et l'évaluation quantitative de taxonomies est un sujet négligé. Lorsque les chercheurs se concentrent sur la construction d'une taxonomie à partir de grands corpus non structurés, l'évaluation est faite souvent manuellement, ce qui implique des biais et se traduit souvent par une reproductibilité limitée. Les entreprises qui souhaitent améliorer leur taxonomie manquent souvent d'étalon ou de référence, une sorte de taxonomie bien optimisée pouvant service de référence. Par conséquent, des connaissances et des efforts spécialisés sont nécessaires pour évaluer une taxonomie. Dans ce travail, nous soutenons que l'évaluation d'une taxonomie effectuée automatiquement et de manière reproductible est aussi importante que la génération automatique de telles taxonomies. Nous proposons deux nouvelles méthodes d'évaluation qui produisent des scores moins biaisés: un modèle de classification de la taxonomie extraite d'un corpus étiqueté, et un modèle de langue non supervisé qui sert de source de connaissances pour évaluer les relations hyperonymiques. Nous constatons que nos substituts d'évaluation corrèlent avec les jugements humains et que les modèles de langue pourraient imiter les experts humains dans les tâches riches en connaissances. / Taxonomies are an essential knowledge representation and play an important role in classification and numerous knowledge-rich applications, yet quantitative taxonomy evaluation remains to be overlooked and left much to be desired. While studies focus on automatic taxonomy construction (ATC) for extracting meaningful structures and semantics from large corpora, their evaluation is usually manual and subject to bias and low reproducibility. Companies wishing to improve their domain-focused taxonomies also suffer from lacking ground-truths. In fact, manual taxonomy evaluation requires substantial labour and expert knowledge. As a result, we argue in this thesis that automatic taxonomy evaluation (ATE) is just as important as taxonomy construction. We propose two novel taxonomy evaluation methods for automatic taxonomy scoring, leveraging supervised classification for labelled corpora and unsupervised language modelling as a knowledge source for unlabelled data. We show that our evaluation proxies can exert similar effects and correlate well with human judgments and that language models can imitate human experts on knowledge-rich tasks.
19

Different Contributions to Cost-Effective Transcription and Translation of Video Lectures

Silvestre Cerdà, Joan Albert 05 April 2016 (has links)
[EN] In recent years, on-line multimedia repositories have experiencied a strong growth that have made them consolidated as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that gives accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main outcome derived from this thesis, The transLectures-UPV Platform, has been publicly released as an open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in many Spanish and European universities and institutions. / [ES] Durante estos últimos años, los repositorios multimedia on-line han experimentado un gran crecimiento que les ha hecho establecerse como fuentes fundamentales de conocimiento, especialmente en el área de la educación, donde se han creado grandes repositorios de vídeo charlas educativas para complementar e incluso reemplazar los métodos de enseñanza tradicionales. No obstante, la mayoría de estas charlas no están transcritas ni traducidas debido a la ausencia de soluciones de bajo coste que sean capaces de hacerlo garantizando una calidad mínima aceptable. Soluciones de este tipo son claramente necesarias para hacer que las vídeo charlas sean más accesibles para hablantes de otras lenguas o para personas con discapacidades auditivas. Además, dichas soluciones podrían facilitar la aplicación de funciones de búsqueda y de análisis tales como clasificación, recomendación o detección de plagios, así como el desarrollo de funcionalidades educativas avanzadas, como por ejemplo la generación de resúmenes automáticos de contenidos para ayudar al estudiante a tomar apuntes. Por este motivo, el principal objetivo de esta tesis es desarrollar una solución de bajo coste capaz de transcribir y traducir vídeo charlas con un nivel de calidad razonable. Más específicamente, abordamos la integración de técnicas estado del arte de Reconocimiento del Habla Automático y Traducción Automática en grandes repositorios de vídeo charlas educativas para la generación de subtítulos multilingües de alta calidad sin requerir intervención humana y con un reducido coste computacional. Además, también exploramos los beneficios potenciales que conllevaría la explotación de la información de la que disponemos a priori sobre estos repositorios, es decir, conocimientos específicos sobre las charlas tales como el locutor, la temática o las transparencias, para crear sistemas de transcripción y traducción especializados mediante técnicas de adaptación masiva. Las soluciones propuestas en esta tesis han sido testeadas en escenarios reales llevando a cabo nombrosas evaluaciones objetivas y subjetivas, obteniendo muy buenos resultados. El principal legado de esta tesis, The transLectures-UPV Platform, ha sido liberado públicamente como software de código abierto, y, en el momento de escribir estas líneas, está sirviendo transcripciones y traducciones automáticas para diversos miles de vídeo charlas educativas en nombrosas universidades e instituciones Españolas y Europeas. / [CA] Durant aquests darrers anys, els repositoris multimèdia on-line han experimentat un gran creixement que els ha fet consolidar-se com a fonts fonamentals de coneixement, especialment a l'àrea de l'educació, on s'han creat grans repositoris de vídeo xarrades educatives per tal de complementar o inclús reemplaçar els mètodes d'ensenyament tradicionals. No obstant això, la majoria d'aquestes xarrades no estan transcrites ni traduïdes degut a l'absència de solucions de baix cost capaces de fer-ho garantint una qualitat mínima acceptable. Solucions d'aquest tipus són clarament necessàries per a fer que les vídeo xarres siguen més accessibles per a parlants d'altres llengües o per a persones amb discapacitats auditives. A més, aquestes solucions podrien facilitar l'aplicació de funcions de cerca i d'anàlisi tals com classificació, recomanació o detecció de plagis, així com el desenvolupament de funcionalitats educatives avançades, com per exemple la generació de resums automàtics de continguts per ajudar a l'estudiant a prendre anotacions. Per aquest motiu, el principal objectiu d'aquesta tesi és desenvolupar una solució de baix cost capaç de transcriure i traduir vídeo xarrades amb un nivell de qualitat raonable. Més específicament, abordem la integració de tècniques estat de l'art de Reconeixement de la Parla Automàtic i Traducció Automàtica en grans repositoris de vídeo xarrades educatives per a la generació de subtítols multilingües d'alta qualitat sense requerir intervenció humana i amb un reduït cost computacional. A més, també explorem els beneficis potencials que comportaria l'explotació de la informació de la que disposem a priori sobre aquests repositoris, és a dir, coneixements específics sobre les xarrades tals com el locutor, la temàtica o les transparències, per a crear sistemes de transcripció i traducció especialitzats mitjançant tècniques d'adaptació massiva. Les solucions proposades en aquesta tesi han estat testejades en escenaris reals duent a terme nombroses avaluacions objectives i subjectives, obtenint molt bons resultats. El principal llegat d'aquesta tesi, The transLectures-UPV Platform, ha sigut alliberat públicament com a programari de codi obert, i, en el moment d'escriure aquestes línies, està servint transcripcions i traduccions automàtiques per a diversos milers de vídeo xarrades educatives en nombroses universitats i institucions Espanyoles i Europees. / Silvestre Cerdà, JA. (2016). Different Contributions to Cost-Effective Transcription and Translation of Video Lectures [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62194
20

Learning and time : on using memory and curricula for language understanding

Gulcehre, Caglar 05 1900 (has links)
No description available.

Page generated in 0.1321 seconds