Global ETD Search

1	Coping with uncertainty : noun phrase interpretation and early semantic analysis Mellish, Christopher Stuart January 1981 (has links) A computer program which can "understand" natural language texts must have both syntactic knowledge about the language concerned and semantic knowledge of how what is written relates to its internal representation of the world. It has been a matter of some controversy how these sources of information can best be integrated to translate from an input text to a formal meaning representation. The controversy has concerned largely the question as to what degree of syntactic analysis must be performed before any semantic analysis can take place. An extreme position in this debate is that a syntactic parse tree for a complete sentence must be produced before any investigation of that sentence's meaning is appropriate. This position has been criticised by those who see understanding as a process that takes place gradually as the text is read, rather than in sudden bursts of activity at the ends of sentences. These people advocate a model where semantic analysis can operate on fragments of text before the global syntactic structure is determined - a strategy which we will call early semantic analysis. In this thesis, we investigate the implications of early semantic analysis in the interpretation of noun phrases. One possible approach is to say that a noun phrase is a self-contained unit and can be fully interpreted by the time it has been read. Thus it can always be determined what objects a noun phrase refers to without consulting much more than the structure of the phrase itself. This approach was taken in part by Winograd [Winograd 72], who saw the constraint that a noun phrase have a referent as a valuable aid in resolving local syntactic ambiguity. Unfortunately, Winograd's work has been criticised by Ritchie, because it is not always possible to determine what a noun phrase refers to purely on the basis of local information. In this thesis, we will go further than this and claim that, because the meaning of a noun phrase can be affected by so many factors outside the phrase itself, it makes no sense to talk about "the referent" as a function of -a noun phrase. Instead, the notion of "referent" is something defined by global issues of structure and consistency. Having rejected one approach to the early semantic analysis of noun phrases, we go on to develop an alternative, which we call incremental evaluation. The basic idea is that a noun phrase does provide some information about what it refers to. It should be possible to represent this partial information and gradually refine it as relevant implications of the context are followed up. Moreover, the partial information should be available to an inference system, which, amongst other things, can detect the absence of a referent and provide the advantages of Winograd's system. In our system, noun phrase interpretation does take place locally, but the point is that it does not finish there. Instead, the determination of the meaning of a noun phrase is spread over the subsequent analysis of how it contributes to the meaning of the text as a whole. 410
2	Compositional Matrix-Space Models: Learning Methods and Evaluation Asaadi, Shima 13 October 2020 (has links) There has been a lot of research on machine-readable representations of words for natural language processing (NLP). One mainstream paradigm for the word meaning representation comprises vector-space models obtained from the distributional information of words in the text. Machine learning techniques have been proposed to produce such word representations for computational linguistic tasks. Moreover, the representation of multi-word structures, such as phrases, in vector space can arguably be achieved by composing the distributional representation of the constituent words. To this end, mathematical operations have been introduced as composition methods in vector space. An alternative approach to word representation and semantic compositionality in natural language has been compositional matrix-space models. In this thesis, two research directions are considered. In the first, considering compositional matrix-space models, we explore word meaning representations and semantic composition of multi-word structures in matrix space. The main motivation for working on these models is that they have shown superiority over vector-space models regarding several properties. The most important property is that the composition operation in matrix-space models can be defined as standard matrix multiplication; in contrast to common vector space composition operations, this is sensitive to word order in language. We design and develop machine learning techniques that induce continuous and numeric representations of natural language in matrix space. The main goal in introducing representation models is enabling NLP systems to understand natural language to solve multiple related tasks. Therefore, first, different supervised machine learning approaches to train word meaning representations and capture the compositionality of multi-word structures using the matrix multiplication of words are proposed. The performance of matrix representation models learned by machine learning techniques is investigated in solving two NLP tasks, namely, sentiment analysis and compositionality detection. Then, learning techniques for learning matrix-space models are proposed that introduce generic task-agnostic representation models, also called word matrix embeddings. In these techniques, word matrices are trained using the distributional information of words in a given text corpus. We show the effectiveness of these models in the compositional representation of multi-word structures in natural language. The second research direction in this thesis explores effective approaches for evaluating the capability of semantic composition methods in capturing the meaning representation of compositional multi-word structures, such as phrases. A common evaluation approach is examining the ability of the methods in capturing the semantic relatedness between linguistic units. The underlying assumption is that the more accurately a method of semantic composition can determine the representation of a phrase, the more accurately it can determine the relatedness of that phrase with other phrases. To apply the semantic relatedness approach, gold standard datasets have been introduced. In this thesis, we identify the limitations of the existing datasets and develop a new gold standard semantic relatedness dataset, which addresses the issues of the existing datasets. The proposed dataset allows us to evaluate meaning composition in vector- and matrix-space models. info:eu-repo/classification/ddc/004 ddc:004
3	Performance Optimizations and Operator Semantics for Streaming Data Flow Programs Sax, Matthias J. 01 July 2020 (has links) Unternehmen sammeln mehr Daten als je zuvor und müssen auf diese Informationen zeitnah reagieren. Relationale Datenbanken eignen sich nicht für die latenzfreie Verarbeitung dieser oft unstrukturierten Daten. Um diesen Anforderungen zu begegnen, haben sich in der Datenbankforschung seit dem Anfang der 2000er Jahre zwei neue Forschungsrichtungen etabliert: skalierbare Verarbeitung unstrukturierter Daten und latenzfreie Datenstromverarbeitung. Skalierbare Verarbeitung unstrukturierter Daten, auch bekannt unter dem Begriff "Big Data"-Verarbeitung, hat in der Industrie schnell Einzug erhalten. Gleichzeitig wurden in der Forschung Systeme zur latenzfreien Datenstromverarbeitung entwickelt, die auf eine verteilte Architektur, Skalierbarkeit und datenparallele Verarbeitung setzen. Obwohl diese Systeme in der Industrie vermehrt zum Einsatz kommen, gibt es immer noch große Herausforderungen im praktischen Einsatz. Diese Dissertation verfolgt zwei Hauptziele: Zuerst wird das Laufzeitverhalten von hochskalierbaren datenparallelen Datenstromverarbeitungssystemen untersucht. Im zweiten Hauptteil wird das "Dual Streaming Model" eingeführt, das eine Semantik zur gleichzeitigen Verarbeitung von Datenströmen und Tabellen beschreibt. Das Ziel unserer Untersuchung ist ein besseres Verständnis über das Laufzeitverhalten dieser Systeme zu erhalten und dieses Wissen zu nutzen um Anfragen automatisch ausreichende Rechenkapazität zuzuweisen. Dazu werden ein Kostenmodell und darauf aufbauende Optimierungsalgorithmen für Datenstromanfragen eingeführt, die Datengruppierung und Datenparallelität einbeziehen. Das vorgestellte Datenstromverarbeitungsmodell beschreibt das Ergebnis eines Operators als kontinuierlichen Strom von Veränderugen auf einer Ergebnistabelle. Dabei behandelt unser Modell die Diskrepanz der physikalischen und logischen Ordnung von Datenelementen inhärent und erreicht damit eine deterministische Semantik und eine minimale Verarbeitungslatenz. / Modern companies are able to collect more data and require insights from it faster than ever before. Relational databases do not meet the requirements for processing the often unstructured data sets with reasonable performance. The database research community started to address these trends in the early 2000s. Two new research directions have attracted major interest since: large-scale non-relational data processing as well as low-latency data stream processing. Large-scale non-relational data processing, commonly known as "Big Data" processing, was quickly adopted in the industry. In parallel, low latency data stream processing was mainly driven by the research community developing new systems that embrace a distributed architecture, scalability, and exploits data parallelism. While these systems have gained more and more attention in the industry, there are still major challenges to operate them at large scale. The goal of this dissertation is two-fold: First, to investigate runtime characteristics of large scale data-parallel distributed streaming systems. And second, to propose the "Dual Streaming Model" to express semantics of continuous queries over data streams and tables. Our goal is to improve the understanding of system and query runtime behavior with the aim to provision queries automatically. We introduce a cost model for streaming data flow programs taking into account the two techniques of record batching and data parallelization. Additionally, we introduce optimization algorithms that leverage our model for cost-based query provisioning. The proposed Dual Streaming Model expresses the result of a streaming operator as a stream of successive updates to a result table, inducing a duality between streams and tables. Our model handles the inconsistency of the logical and the physical order of records within a data stream natively, which allows for deterministic semantics as well as low latency query execution. Datenstromverarbeitung Datenflussprogram Parallelität Optimierung Verarbeitungssemantik Data Stream Processing Data Flow Program Parallelization Optimization Processing Semantics 004 Informatik ST 265 ddc:004

1

Page generated in 0.0808 seconds