Global ETD Search

881	A metáfora e a sua representação em sistemas de processamento automático de línguas naturais / Oliveira, Ana Eliza Barbosa de. January 2006 (has links) Orientador: Bento Carlos Dias da Silva / Banca: Antônio Suárez Abreu / Banca: Roberta Pires de Oliveira / Resumo: Este trabalho tem como proposta (i) o estudo da metáfora per se (em oposição, por exemplo, a um estudo aplicado da metáfora) da perspectiva lingüística, isto é, o estudo da metáfora enquanto uma expressão da linguagem natural e (ii) a investigação de uma representação formal da metáfora para fins de implementação em sistemas de processamento automático de línguas naturais. A metodologia que norteia o desenvolvimento da proposta, que se insere em um contexto interdisciplinar, focaliza dois domínios: o Domínio Lingüístico-Cognitivo, em que se investiga a expressão lingüística e o suporte cognitivo da metáfora, ou seja, a metáfora enquanto um produto resultante de recursos lingüísticos e não- lingüísticos; e o Domínio Lingüístico-Computacional, em que se investiga a representação formal da produção e da interpretação da metáfora para fins computacionais. Como delimitadores dessas investigações, adotam-se os seguintes enfoques: Retórico-Filosófico, Interacionista, Semântico, Pragmático, Cognitivista e Computacional. / Abstract: This MS thesis concerns the study of metaphor per se, (as opposed to applied metaphor) from the linguistic point of view, and the investigation of a formal metaphor representation for Natural Language Processing systems. The overall methodology focuses on two domains: a Cognitive- Linguistic Domain, in which we investigate the metaphor linguistic expression and its cognitive import, i.e., metaphor as a linguistic product and as a nonlinguistic mechanism; and a Computational- Linguistic Domain, in which we investigate a formal representation for the metaphor production and interpretation. The theoretical approaches that constrain the scope of this work are: philosophical- rhetoric, interactionist, semantic, pragmatic, cognitive and computational assessment to metaphor. / Mestre Linguística. Metáfora. Metaphor. eng Cognitive-linguistic domain. eng Computational-linguistic domain. eng Formal representation. eng Natural Language processing. eng
882	Design und Implementierung eines Algorithmus zum maschinellen Lernen der Flexion eines Korpus deutscher Sprache Moritz, Julian 20 October 2017 (has links) Die vorliegende Arbeit beschreibt das Design und die Implementierung eines Algorithmus zur Flexion. Es wird am Beispiel des Deutschen eine konkrete Implementierung entwickelt. Hierfür findet zunächst eine ausführliche Analyse der Flexion des Deutschen statt, bevor ein Verfahren erarbeitet wird, das sprachunabhängig ist und somit prinzipiell auf andere Sprachen übertragen werden kann. Die tatsächliche Machbarkeit des Verfahrens wird anhand von Beispielen nachgewiesen. Die hohe Komplexität der Aufgabe führt allerdings dazu, dass es in der Praxis zu Abstrichen bei der Qualität der flektierten Wortformen kommt. Dies ist insbesondere deswegen der Fall, da das entwickelte System auch ihm unbekannte Grundformen flektiert. info:eu-repo/classification/ddc/000 ddc:000
883	Separierung mit FindLinks gecrawlter Texte nach Sprachen Pollmächer, Johannes 13 February 2018 (has links) In dieser Arbeit wird ein Programm zur Sprachidentifikation von Web-Dokumenten vorgestellt. Das Verfahren nutzt Worthäufigkeitslisten als Trainingsdaten, um anhand dieser Dokumentenklassifikation in Sprachen vorzunehmen. Somit gehört dieses Werkzeug zu den supervised-learning-Systemen. Die zu klassifizierenden Web-Dokumente wurden mittels des von der Abteilung fur Automatische Sprachverarbeitung entwickelten Tools 'FindLinks' heruntergeladen. Das Programm ist somit in die Nachverarbeitung bestehender Rohdaten einzuordnen. / This BSc Thesis presents a program for automatic language identification of web-documents called LangSepa. The procedure uses training-data which is based on word-frequency-tables of over 350 natural languages. Thus this tool can be subsumed under supervised learning systems. The documents for the classification-task were crawled by an information-retrieval system called FindLinks, which is developed at the Natural Language Processing group at the University of Leipzig. Therefore the presented program will be employed for the postprocessing of existent raw data. info:eu-repo/classification/ddc/000 ddc:000
884	Extrakce znalostních grafů z projektové dokumentace / Extrakce znalostních grafů z projektové dokumentace Helešic, Tomáš January 2014 (has links) Title: Knowledge Graph Extraction from Project Documentation Author: Bc. Tomáš Helešic Department: Department of Software Engineering Supervisor: Mgr. Martin Nečaský, Ph.D. Abstract: With the new research progress in the natural language processing and information extraction from text, new possibility of automatic knowledge acqui- sition and its grouping into Knowledge graphs, that are catching the semantic relations between these entities is emerging. For these Knowledge graphs, data storages and also query languages already exists, which allow more precise and relevant search possibilities compare with current full text search engines. The goal of this thesis is to explore the opportunity of automatic extraction of infor- mation from project documentation with the use of linguistic text processing, design a proper data storage and build a search engine over it. Keywords: Knowledge grahs, Information extraction, Natural language process- ing, Resource Description Framework 1
885	Extrakce znalostních grafů z projektové dokumentace / Extrakce znalostních grafů z projektové dokumentace Helešic, Tomáš January 2014 (has links) Title: Knowledge Graph Extraction from Project Documentation Author: Bc. Tomáš Helešic Department: Department of Software Engineering Supervisor: Mgr. Martin Nečaský, Ph.D. Abstract: The goal of this thesis is to explore the possibilities of automatic in- formation extraction from company project documentation with the use of ma- chine natural language processing and the analysis of the precision of linguistic processing of these documents. Furthermore suggest methods how acquire key terms and dependencies between them. From this terms and dependencies cre- ate knowledge graphs, that are stored in an appropriate database with search engine. The work is trying to interconnect already existing technologies in a shape of a simple application and test their readiness for a practical use. The goal is to inspire future research in this field, identify critical parts and propose improvements. The main gain is in the interconnection between natural lan- guage processing, methods of information extraction and semantic searching in corporate documents. The gain of the practical part reside in the way how to identify key information that is uniquely describing each document and its use in search. Keywords: Knowledge graphs, Information extraction, Natural language pro- cessing, Resource Description Framework 1
886	Improving Academic Natural Language Processing Infrastructures Utilizing Cluster Computation Sahami, Soheila 25 September 2020 (has links) In light of widespread digitization endeavors and ever-growing textual data generation, developing efficient academic Natural Language Processing (NLP) infrastructures, which can deal with large amounts of data, is of particular importance. Novel computation technologies allow tools that support big data and heavy computation while performing timely and cost-effective data processing. This development has led researchers to demand that knowledge be extracted from ever-increasing textual data before it is outdated. Cluster computation is a modern technology for handling big data efficiently. It provides distribution of computing and data over a number of machines in a cluster, as well as efficient use of resources, which are key requirements to process big data in a timely manner. It also assures applications’ high availability and fault tolerance, which are fundamental concerns when dealing with vast amounts of data. In addition, it provides load balancing of data during the execution of tasks, which results in optimal use of resources and enhances efficiency. Data-oriented parallelization is an effective solution to enable the currently available academic NLP infrastructures to process big data. This approach offers a solution to parallelize the NLP tools which comprise identical non-complicated tasks without the expense of changing NLP algorithms. This thesis presents the adaption of cluster computation technology to academic NLP infrastructures to address the notable features that are essential to process vast quantities of text materials efficiently, in terms of both resources and time. Apache Spark on top of Apache Hadoop and its ecosystem have been utilized to develop a set of NLP tools that provide a distributed environment to execute the NLP tasks. Many experiments were conducted to assess the functionality of the designated strategy. This thesis shows that using cluster computation technology and data-oriented parallelization enables academic NLP infrastructures to execute large amounts of textual data in a timely manner while improving the performance of the NLP tools. Moreover, these experiments provide information that brings a more realistic and transparent estimation of workflows’ costs (required hardware resources) and execution time, along with the fastest, optimum, or feasible resource configuration for each individual workflow. This knowledge can be employed by users to trade-off between run-time, size of data, and hardware, and it enables them to design a strategy for data storage, duration of data retention, and delivery time. This has the potential to enhance researchers’ satisfaction when using academic NLP infrastructures. The thesis also shows that a cluster computation approach provides the capacity to adapt NLP services with JIT delivery systems. The proposed strategy assures the reliability and predictability of the services, which are the main characteristics of the services in JIT delivery systems. Defining the relevant parameters, recording the behavior of the services, and analyzing the generated data resulted in the provision of knowledge that can be utilized to create a service catalog—a fundamental requirement for the services in JIT delivery systems—for each service offered. This knowledge also helps to generate the performance profiles for each item mentioned in the service catalog and to update them continuously to cover new experiments and improve service quality. info:eu-repo/classification/ddc/000 ddc:000
887	Efektivní neuronová syntéza řeči / Efficient neural speech synthesis Vainer, Jan January 2020 (has links) While recent neural sequence-to-sequence models have greatly improved the quality of speech synthesis, there has not been a system capable of fast training, fast inference and high-quality audio synthesis at the same time. In this the- sis, we present a neural speech synthesis system capable of high-quality faster- than-real-time spectrogram synthesis, with low requirements on computational resources and fast training time. Our system consists of a teacher and a student network. The teacher model is used to extract alignment between the text to synthesize and the corresponding spectrogram. The student uses the alignments from the teacher model to synthesize mel-scale spectrograms from a phonemic representation of the input text efficiently. Both systems utilize simple convo- lutional layers. We train both systems on the english LJSpeech dataset. The quality of samples synthesized by our model was rated significantly higher than baseline models. Our model can be efficiently trained on a single GPU and can run in real time even on a CPU. 1
888	Rysy z eye-trackeru v syntaktickém parsingu / Eye-tracking features in syntactic parsing Agrawal, Abhishek January 2020 (has links) In this thesis, we explore the potential benefits of leveraging eye-tracking information for dependency parsing on the English part of the Dundee corpus. To achieve this, we cast dependency parsing as a sequence labelling task and then augment the neural model for sequence labelling with eye-tracking features. We also augment a graph-based parser with eye-tracking features and parse the Dundee Corpus to corroborate our findings from the sequence labelling parser. We then experiment with a variety of parser setups ranging from parsing with all features to a delexicalized parser. Our experiments show that for a parser with all features, although the improvements are positive for the LAS score they are not significant whereas our delexicalized parser significantly outperforms the baseline we established. We also analyze the contribution of various eye-tracking features towards the different parser setups and find that eye-tracking features contain information which is complementary in nature, thus implying that augmenting the parser with various gaze features grouped together provides better performance than any individual gaze feature. 1
889	Characterisation of a developer’s experience fields using topic modelling Déhaye, Vincent January 2020 (has links) Finding the most relevant candidate for a position represents an ubiquitous challenge for organisations. It can also be arduous for a candidate to explain on a concise resume what they have experience with. Due to the fact that the candidate usually has to select which experience to expose and filter out some of them, they might not be detected by the person carrying out the search, whereas they were indeed having the desired experience. In the field of software engineering, developing one's experience usually leaves traces behind: the code one produced. This project explores approaches to tackle the screening challenges with an automated way of extracting experience directly from code by defining common lexical patterns in code for different experience fields, using topic modeling. Two different techniques were compared. On one hand, Latent Dirichlet Allocation (LDA) is a generative statistical model which has proven to yield good results in topic modeling. On the other hand Non-Negative Matrix Factorization (NMF) is simply a singular value decomposition of a matrix representing the code corpus as word counts per piece of code.The code gathered consisted of 30 random repositories from all the collaborators of the open-source Ruby-on-Rails project on GitHub, which was then applied common natural language processing transformation steps. The results of both techniques were compared using respectively perplexity for LDA, reconstruction error for NMF and topic coherence for both. The two first represent how well the data could be represented by the topics produced while the later estimates the hanging and fitting together of the elements of a topic, and can depict human understandability and interpretability. Given that we did not have any similar work to benchmark with, the performance of the values obtained is hard to assess scientifically. However, the method seems promising as we would have been rather confident in assigning labels to 10 of the topics generated. The results imply that one could probably use natural language processing methods directly on code production in order to extend the detected fields of experience of a developer, with a finer granularity than traditional resumes and with fields definition evolving dynamically with the technology. Computer Systems Datorsystem
890	Konzeption eines dreistufigen Transfers für die maschinelle Übersetzung natürlicher Sprachen Laube, Annett, Karl, Hans-Ulrich 14 December 2012 (has links) 0 VORWORT Die für die Übersetzung von Programmiersprachen benötigten Analyse- und Synthesealgorithmen können bereits seit geraumer Zeit relativ gut sprachunabhängig formuliert werden. Dies findet seinen Ausdruck unter anderem in einer Vielzahl von Generatoren, die den Übersetzungsproze? ganz oder teilweise automatisieren lassen. Die Syntax der zu verarbeitenden Sprache steht gewöhnlich in Datenform (Graphen, Listen) auf der Basis formaler Beschreibungsmittel (z.B. BNF) zur Verfügung. Im Bereich der Übersetzung natürlicher Sprachen ist die Trennung von Sprache und Verarbeitungsalgorithmen - wenn überhaupt - erst ansatzweise vollzogen. Die Gründe liegen auf der Hand. Natürliche Sprachen sind mächtiger, ihre formale Darstellung schwierig. Soll die Übersetzung auch die mündliche Kommunikation umfassen, d.h. den menschlichen Dolmetscher auf einer internationalen Konferenz oder beim Telefonieren mit einem Partner, der eine andere Sprache spricht, ersetzen, kommen Echtzeitanforderungen dazu, die dazu zwingen werden, hochparallele Ansätze zu verfolgen. Der Prozess der Übersetzung ist auch dann, wenn keine Echtzeiterforderungen vorliegen, außerordentlich komplex. Lösungen werden mit Hilfe des Interlingua- und des Transferansatzes gesucht. Verstärkt werden dabei formale Beschreibungsmittel realtiv gut erforschter Teilgebiete der Informatik eingesetzt (Operationen über dekorierten Bäumen, Baum-zu-Baum-Übersetzungsstrategien), von denen man hofft, daß die Ergebnisse weiter führen werden als spektakuläre Prototypen, die sich jetzt schon am Markt befinden und oft aus heuristischen Ansätzen abgeleitet sind. [...]:0 Vorwort S. 2 1 Einleitung 2. 4 2 Die Komponenten des dreistufigen Transfers S. 5 3 Formalisierung der Komposition S. 8 4 Pre-Transfer-Phase S. 11 5 Formalisierung der Pre-Transfer-Phase S. 13 6 Transfer-Phase S. 18 7 Formalisierung der Transfer-Phase S. 20 8 Post-Transfer-Phase S. 24 9 Transfer-Beispiel S. 25 10 Zusammenfassung S. 29 info:eu-repo/classification/ddc/004 ddc:004

Search results