Global ETD Search

1	Probabilistic models of natural language semantics Schuster, Ingmar 14 June 2016 (has links) (PDF) This thesis tackles the problem of modeling the semantics of natural language. Neural Network models are reviewed and a new Bayesian approach is developed and evaluated. As the performance of standard Monte Carlo algorithms proofed to be unsatisfactory for the developed models, the main focus lies on a new adaptive algorithm from the Sequential Monte Carlo (SMC) family. The Gradient Importance Sampling (GRIS) algorithm developed in the thesis is shown to give very good performance as compared to many adaptive Markov Chain Monte Carlo (MCMC) algorithms on a range of complex target distributions. Another advantage as compared to MCMC is that GRIS provides a straight forward estimate of model evidence. Finally, Sample Inflation is introduced as a means to reduce variance and speed up mode finding in Importance Sampling and SMC algorithms. Sample Inflation provides provably consistent estimates and is empirically found to improve convergence of integral estimates. / Diese Dissertation befasst sich mit der Modellierung der Semantik natürlicher Sprache. Eine Übersicht von Neuronalen Netzwerkmodellen wird gegeben und ein eigener Bayesscher Ansatz wird entwickelt und evaluiert. Da die Leistungsfähigkeit von Standardalgorithmen aus der Monte-Carlo-Familie auf dem entwickelten Model unbefriedigend ist, liegt der Hauptfokus der Arbeit auf neuen adaptiven Algorithmen im Rahmen von Sequential Monte Carlo (SMC). Es wird gezeigt, dass der in der Dissertation entwickelte Gradient Importance Sampling (GRIS) Algorithmus sehr leistungsfähig ist im Vergleich zu vielen Algorithmen des adaptiven Markov Chain Monte Carlo (MCMC), wobei komplexe und hochdimensionale Integrationsprobleme herangezogen werden. Ein weiterer Vorteil im Vergleich mit MCMC ist, dass GRIS einen Schätzer der Modelevidenz liefert. Schließlich wird Sample Inflation eingeführt als Ansatz zur Reduktion von Varianz und schnellerem auffinden von Modi in einer Verteilung, wenn Importance Sampling oder SMC verwendet werden. Sample Inflation ist beweisbar konsistent und es wird empirisch gezeigt, dass seine Anwendung die Konvergenz von Integralschätzern verbessert. natürliche Sprache semantik Monte Carlo natural language semantics Monte Carlo ddc:000
2	Generating Formal Representations of System Specification from Natural Language Requirements Irfan, Zeeshan 05 October 2020 (has links) Natural Language (NL) requirements play a significant role in specifying the system design, implementation and testing processes. Nevertheless, NL requirements are generally syntactically ambiguous and semantically inconsistent. Issues with NL requirements can result into inaccurate and preposterous system design, implementation and testing. Moreover, informal nature of NL is a major hurdle in machine processing of system requirements specifications. To confront this problem, a requirement template is introduced, based on controlled NL to produce deterministic and consistent representation of the system. The ultimate focus of this thesis is to generate test cases from system specifications driven from requirements communicated in natural language. Manual software systems testing is a labour intensive, error prone and high cost activity. Traditionally, model-driven test generation approaches are employed for automated testing. However, system models are created manually for test generation. The test cases generated from system models are not generally deterministic and traceable with individual requirements. This thesis proposes an approach for software system testing based on template-driven requirements. This systematic approach is applied on the requirements elicited from system stakeholders. For this purpose natural language processing (NLP) methods are used. Using NLP approaches, useful information is extracted from controlled NL requirements and afterwards the gathered information is processed to generate test scenarios. Our inceptive observation exhibits that this method provides remarkable gains in terms of reducing the cost, time and complexity of requirements based testing. info:eu-repo/classification/ddc/004 ddc:004 Informatik; Natürliche Sprache
3	Probabilistic models of natural language semantics Schuster, Ingmar 01 June 2016 (has links) This thesis tackles the problem of modeling the semantics of natural language. Neural Network models are reviewed and a new Bayesian approach is developed and evaluated. As the performance of standard Monte Carlo algorithms proofed to be unsatisfactory for the developed models, the main focus lies on a new adaptive algorithm from the Sequential Monte Carlo (SMC) family. The Gradient Importance Sampling (GRIS) algorithm developed in the thesis is shown to give very good performance as compared to many adaptive Markov Chain Monte Carlo (MCMC) algorithms on a range of complex target distributions. Another advantage as compared to MCMC is that GRIS provides a straight forward estimate of model evidence. Finally, Sample Inflation is introduced as a means to reduce variance and speed up mode finding in Importance Sampling and SMC algorithms. Sample Inflation provides provably consistent estimates and is empirically found to improve convergence of integral estimates. / Diese Dissertation befasst sich mit der Modellierung der Semantik natürlicher Sprache. Eine Übersicht von Neuronalen Netzwerkmodellen wird gegeben und ein eigener Bayesscher Ansatz wird entwickelt und evaluiert. Da die Leistungsfähigkeit von Standardalgorithmen aus der Monte-Carlo-Familie auf dem entwickelten Model unbefriedigend ist, liegt der Hauptfokus der Arbeit auf neuen adaptiven Algorithmen im Rahmen von Sequential Monte Carlo (SMC). Es wird gezeigt, dass der in der Dissertation entwickelte Gradient Importance Sampling (GRIS) Algorithmus sehr leistungsfähig ist im Vergleich zu vielen Algorithmen des adaptiven Markov Chain Monte Carlo (MCMC), wobei komplexe und hochdimensionale Integrationsprobleme herangezogen werden. Ein weiterer Vorteil im Vergleich mit MCMC ist, dass GRIS einen Schätzer der Modelevidenz liefert. Schließlich wird Sample Inflation eingeführt als Ansatz zur Reduktion von Varianz und schnellerem auffinden von Modi in einer Verteilung, wenn Importance Sampling oder SMC verwendet werden. Sample Inflation ist beweisbar konsistent und es wird empirisch gezeigt, dass seine Anwendung die Konvergenz von Integralschätzern verbessert. info:eu-repo/classification/ddc/000 ddc:000 natural language, semantics,Monte Carlo
4	Universality and variability in the statistics of data with fat-tailed distributions: the case of word frequencies in natural languages Gerlach, Martin 10 March 2016 (has links) (PDF) Natural language is a remarkable example of a complex dynamical system which combines variation and universal structure emerging from the interaction of millions of individuals. Understanding statistical properties of texts is not only crucial in applications of information retrieval and natural language processing, e.g. search engines, but also allow deeper insights into the organization of knowledge in the form of written text. In this thesis, we investigate the statistical and dynamical processes underlying the co-existence of universality and variability in word statistics. We combine a careful statistical analysis of large empirical databases on language usage with analytical and numerical studies of stochastic models. We find that the fat-tailed distribution of word frequencies is best described by a generalized Zipf’s law characterized by two scaling regimes, in which the values of the parameters are extremely robust with respect to time as well as the type and the size of the database under consideration depending only on the particular language. We provide an interpretation of the two regimes in terms of a distinction of words into a finite core vocabulary and a (virtually) infinite noncore vocabulary. Proposing a simple generative process of language usage, we can establish the connection to the problem of the vocabulary growth, i.e. how the number of different words scale with the database size, from which we obtain a unified perspective on different universal scaling laws simultaneously appearing in the statistics of natural language. On the one hand, our stochastic model accurately predicts the expected number of different items as measured in empirical data spanning hundreds of years and 9 orders of magnitude in size showing that the supposed vocabulary growth over time is mainly driven by database size and not by a change in vocabulary richness. On the other hand, analysis of the variation around the expected size of the vocabulary shows anomalous fluctuation scaling, i.e. the vocabulary is a nonself-averaging quantity, and therefore, fluctuations are much larger than expected. We derive how this results from topical variations in a collection of texts coming from different authors, disciplines, or times manifest in the form of correlations of frequencies of different words due to their semantic relation. We explore the consequences of topical variation in applications to language change and topic models emphasizing the difficulties (and presenting possible solutions) due to the fact that the statistics of word frequencies are characterized by a fat-tailed distribution. First, we propose an information-theoretic measure based on the Shannon-Gibbs entropy and suitable generalizations quantifying the similarity between different texts which allows us to determine how fast the vocabulary of a language changes over time. Second, we combine topic models from machine learning with concepts from community detection in complex networks in order to infer large-scale (mesoscopic) structures in a collection of texts. Finally, we study language change of individual words on historical time scales, i.e. how a linguistic innovation spreads through a community of speakers, providing a framework to quantitatively combine microscopic models of language change with empirical data that is only available on a macroscopic level (i.e. averaged over the population of speakers). Komplexe Systeme Physik Natürliche Sprache Quantitative Linguistik Datenanalyse Complex Systems Physics Natural Language Quantitative Linguistics Data Analysis ddc:530 rvk:SK 820
5	Generierung von natürlichsprachlichen Texten aus semantischen Strukturen im Prozeß der maschinellen Übersetzung - Allgemeine Strukturen und Abbildungen Rosenpflanzer, Lutz, Karl, Hans-Ulrich 14 December 2012 (has links) (PDF) 0 VORWORT Bei der maschinellen Übersetzung natürlicher Sprache dominieren mehrere Probleme. Man hat es immer mit sehr großen Datenmengen zu tun. Auch wenn man nur einen kleinen Text übersetzen will, ist diese Aufgabe in umfänglichen Kontext eingebettet, d.h. alles Wissen über Quell- und Zielsprache muß - in möglichst formalisierter Form - zur Verfügung stehen. Handelt es sich um gesprochenes Wort treten Spracherkennungs- und Sprachausgabeaufgaben sowie harte Echtzeitforderungen hinzu. Die Komplexität des Problems ist - auch unter Benutzung moderner Softwareentwicklungskonzepte - für jeden, der eine Implementation versucht, eine nicht zu unterschätzende Herausforderung. Ansätze, die die Arbeitsprinzipien und Methoden der Informatik konsequent nutzen, stellen ihre Ergebnisse meist nur prototyisch für einen sehr kleinen Teil der Sprache -etwa eine Phrase, einen Satz bzw. mehrere Beispielsätze- heraus und folgern mehr oder weniger induktiv, daß die entwickelte Lösung auch auf die ganze Sprache erfolgreich angewendet werden kann, wenn man nur genügend „Lemminge“ hat, die nach allen Seiten ausschwärmend, die „noch notwendigen Routinearbeiten“ schnell und bienenfleißig ausführen könnten. natürlichsprachliche Texte natürliche Sprache semantische Strukturen maschinelle Übersetzung Prädikatenlogik Computerlinguistik natural language machine translation natural language processing ddc:004 rvk:SS 5514
6	Universality and variability in the statistics of data with fat-tailed distributions: the case of word frequencies in natural languages Gerlach, Martin 01 March 2016 (has links) Natural language is a remarkable example of a complex dynamical system which combines variation and universal structure emerging from the interaction of millions of individuals. Understanding statistical properties of texts is not only crucial in applications of information retrieval and natural language processing, e.g. search engines, but also allow deeper insights into the organization of knowledge in the form of written text. In this thesis, we investigate the statistical and dynamical processes underlying the co-existence of universality and variability in word statistics. We combine a careful statistical analysis of large empirical databases on language usage with analytical and numerical studies of stochastic models. We find that the fat-tailed distribution of word frequencies is best described by a generalized Zipf’s law characterized by two scaling regimes, in which the values of the parameters are extremely robust with respect to time as well as the type and the size of the database under consideration depending only on the particular language. We provide an interpretation of the two regimes in terms of a distinction of words into a finite core vocabulary and a (virtually) infinite noncore vocabulary. Proposing a simple generative process of language usage, we can establish the connection to the problem of the vocabulary growth, i.e. how the number of different words scale with the database size, from which we obtain a unified perspective on different universal scaling laws simultaneously appearing in the statistics of natural language. On the one hand, our stochastic model accurately predicts the expected number of different items as measured in empirical data spanning hundreds of years and 9 orders of magnitude in size showing that the supposed vocabulary growth over time is mainly driven by database size and not by a change in vocabulary richness. On the other hand, analysis of the variation around the expected size of the vocabulary shows anomalous fluctuation scaling, i.e. the vocabulary is a nonself-averaging quantity, and therefore, fluctuations are much larger than expected. We derive how this results from topical variations in a collection of texts coming from different authors, disciplines, or times manifest in the form of correlations of frequencies of different words due to their semantic relation. We explore the consequences of topical variation in applications to language change and topic models emphasizing the difficulties (and presenting possible solutions) due to the fact that the statistics of word frequencies are characterized by a fat-tailed distribution. First, we propose an information-theoretic measure based on the Shannon-Gibbs entropy and suitable generalizations quantifying the similarity between different texts which allows us to determine how fast the vocabulary of a language changes over time. Second, we combine topic models from machine learning with concepts from community detection in complex networks in order to infer large-scale (mesoscopic) structures in a collection of texts. Finally, we study language change of individual words on historical time scales, i.e. how a linguistic innovation spreads through a community of speakers, providing a framework to quantitatively combine microscopic models of language change with empirical data that is only available on a macroscopic level (i.e. averaged over the population of speakers). info:eu-repo/classification/ddc/530 ddc:530
7	Generierung von natürlichsprachlichen Texten aus semantischen Strukturen im Prozeß der maschinellen Übersetzung - Allgemeine Strukturen und Abbildungen Rosenpflanzer, Lutz, Karl, Hans-Ulrich 14 December 2012 (has links) 0 VORWORT Bei der maschinellen Übersetzung natürlicher Sprache dominieren mehrere Probleme. Man hat es immer mit sehr großen Datenmengen zu tun. Auch wenn man nur einen kleinen Text übersetzen will, ist diese Aufgabe in umfänglichen Kontext eingebettet, d.h. alles Wissen über Quell- und Zielsprache muß - in möglichst formalisierter Form - zur Verfügung stehen. Handelt es sich um gesprochenes Wort treten Spracherkennungs- und Sprachausgabeaufgaben sowie harte Echtzeitforderungen hinzu. Die Komplexität des Problems ist - auch unter Benutzung moderner Softwareentwicklungskonzepte - für jeden, der eine Implementation versucht, eine nicht zu unterschätzende Herausforderung. Ansätze, die die Arbeitsprinzipien und Methoden der Informatik konsequent nutzen, stellen ihre Ergebnisse meist nur prototyisch für einen sehr kleinen Teil der Sprache -etwa eine Phrase, einen Satz bzw. mehrere Beispielsätze- heraus und folgern mehr oder weniger induktiv, daß die entwickelte Lösung auch auf die ganze Sprache erfolgreich angewendet werden kann, wenn man nur genügend „Lemminge“ hat, die nach allen Seiten ausschwärmend, die „noch notwendigen Routinearbeiten“ schnell und bienenfleißig ausführen könnten.:0 Vorwort S. 2 1 Allgemeiner Ablauf der Generierung S. 3 1.1 AUFGABE DER GENERIERUNG S. 3 1.2 EINORDNUNG DER GENERIERUNG IN DIE MASCHINELLE ÜBERSETZUNG S.4 1.3 REALISIERUNG S. 4 1.4 MORPHOLOGISCHE GENERIERUNG S.6 2 Strukturen und Abbildungen S. 8 2.1 UNIVERSELLE STRUKTUR: DEFINITION VON GRAPHEN S.8 2.2 FORMALISIERUNG SPEZIELLER SEMANTISCHER STRUKTUREN ALS GRAPHEN S.9 2.3 ABBILDUNG VON STRUKTUREN S.11 2.3.1 Strukturtyperhaltende Funktionen S. 12 2.3.2 Strukturtypverändernde Funktionen S. 19 2.3.3 Komplexe Funktionen S. 20 2.3.4 Abbildung eines gesamten Generierungsprozesses S. 21 4 Beispiel: Generierung von Texten aus prädikatenlogischen Ausdrücken (inkrementeller Algorithmus) S. 23 4.1 ABLAUF S.23 4.2 BEISPIELE VON REGELSTRUKTUREN S.27 5 Zusammenfassung S. 28 6 Quellenverzeichnis S. 30 info:eu-repo/classification/ddc/004 ddc:004

1

Page generated in 0.0571 seconds