• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 131
  • 9
  • 9
  • 5
  • 4
  • 3
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 191
  • 69
  • 60
  • 57
  • 56
  • 43
  • 40
  • 39
  • 38
  • 36
  • 36
  • 35
  • 31
  • 28
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Natural Language Processing, Statistical Inference, and American Foreign Policy

Lauretig, Adam M. 06 November 2019 (has links)
No description available.
92

Facilitating Corpus Annotation by Improving Annotation Aggregation

Felt, Paul L 01 December 2015 (has links) (PDF)
Annotated text corpora facilitate the linguistic investigation of language as well as the automation of natural language processing (NLP) tasks. NLP tasks include problems such as spam email detection, grammatical analysis, and identifying mentions of people, places, and events in text. However, constructing high quality annotated corpora can be expensive. Cost can be reduced by employing low-cost internet workers in a practice known as crowdsourcing, but the resulting annotations are often inaccurate, decreasing the usefulness of a corpus. This inaccuracy is typically mitigated by collecting multiple redundant judgments and aggregating them (e.g., via majority vote) to produce high quality consensus answers. We improve the quality of consensus labels inferred from imperfect annotations in a number of ways. We show that transfer learning can be used to derive benefit from out-dated annotations which would typically be discarded. We show that, contrary to popular preference, annotation aggregation models that take a generative data modeling approach tend to outperform those that take a condition approach. We leverage this insight to develop csLDA, a novel annotation aggregation model that improves on the state of the art for a variety of annotation tasks. When data does not permit generative data modeling, we identify a conditional data modeling approach based on vector-space text representations that achieves state-of-the-art results on several unusual semantic annotation tasks. Finally, we identify a family of models capable of aggregating annotation data containing heterogenous annotation types such as label frequencies and labeled features. We present a multiannotator active learning algorithm for this model family that jointly selects an annotator, data items, and annotation type.
93

Sustainable Recipe Recommendation System: Evaluating the Performance of GPT Embeddings versus state-of-the-art systems

Bandaru, Jaya Shankar, Appili, Sai Keerthi January 2023 (has links)
Background: The demand for a sustainable lifestyle is increasing due to the need to tackle rapid climate change. One-third of carbon emissions come from the food industry; reducing emissions from this industry is crucial when fighting climate change. One of the ways to reduce carbon emissions from this industry is by helping consumers adopt sustainable eating habits by consuming eco-friendly food. To help consumers find eco-friendly recipes, we developed a sustainable recipe recommendation system that can recommend relevant and eco-friendly recipes to consumers using little information about their previous food consumption.  Objective: The main objective of this research is to identify (i) the appropriate recommendation algorithm suitable for a dataset that has few training and testing examples, and (ii) a technique to re-order the recommendation list such that a proper balance is maintained between relevance and carbon rating of the recipes. Method: We conducted an experiment to test the performance of a GPT embeddings-based recommendation system, Factorization Machines, and a version of a Graph Neural Network-based recommendation algorithm called PinSage for a different number of training examples and used ROC AUC value as our metric. After finding the best-performing model we experimented with different re-ordering techniques to find which technique provides the right balance between relevance and sustainability. Results: The results from the experiment show that the PinSage and Factorization Machines predict on average whether an item is relevant or not with 75% probability whereas GPT-embedding-based recommendation systems predict with only 55% probability. We also found the performance of PinSage and Factorization Machines improved as the training set size increased. For re-ordering, we found using a loga- rithmic combination of the relevance score and carbon rating of the recipe helped to reduce the average carbon rating of recommendations with a marginal reduction in the ROC AUC score.  Conclusion: The results show that the chosen state-of-the-art recommendation systems: PinSage and Factorization Machines outperform GPT-embedding-based recommendation systems by almost 1.4 times.
94

Automated Software Defect Localization

Ye, Xin 23 September 2016 (has links)
No description available.
95

Higher-order reasoning with graph data

Leonardo de Abreu Cotta (13170135) 29 July 2022 (has links)
<p>Graphs are the natural framework of many of today’s highest impact computing applications: from online social networking, to Web search, to product recommendations, to chemistry, to bioinformatics, to knowledge bases, to mobile ad-hoc networking. To develop successful applications in these domains, we often need representation learning methods ---models mapping nodes, edges, subgraphs or entire graphs to some meaningful vector space. Such models are studied in the machine learning subfield of graph representation learning (GRL). Previous GRL research has focused on learning node or entire graph representations through associational tasks. In this work I study higher-order (k>1-node) representations of graphs in the context of both associational and counterfactual tasks.<br> </p>
96

News Analytics for Global Infectious Disease Surveillance

Ghosh, Saurav 29 November 2017 (has links)
Traditional disease surveillance can be augmented with a wide variety of open sources, such as online news media, twitter, blogs, and web search records. Rapidly increasing volumes of these open sources are proving to be extremely valuable resources in helping analyze, detect, and forecast outbreaks of infectious diseases, especially new diseases or diseases spreading to new regions. However, these sources are in general unstructured (noisy) and construction of surveillance tools ranging from real-time disease outbreak monitoring to construction of epidemiological line lists involves considerable human supervision. Intelligent modeling of such sources using text mining methods such as, topic models, deep learning and dependency parsing can lead to automated generation of the mentioned surveillance tools. Moreover, real-time global availability of these open sources from web-based bio-surveillance systems, such as HealthMap and WHO Disease Outbreak News (DONs) can aid in development of generic tools which will be applicable to a wide range of diseases (rare, endemic and emerging) across different regions of the world. In this dissertation, we explore various methods of using internet news reports to develop generic surveillance tools which can supplement traditional surveillance systems and aid in early detection of outbreaks. We primarily investigate three major problems related to infectious disease surveillance as follows. (i) Can trends in online news reporting monitor and possibly estimate infectious disease outbreaks? We introduce approaches that use temporal topic models over HealthMap corpus for detecting rare and endemic disease topics as well as capturing temporal trends (seasonality, abrupt peaks) for each disease topic. The discovery of temporal topic trends is followed by time-series regression techniques to estimate future disease incidence. (ii) In the second problem, we seek to automate the creation of epidemiological line lists for emerging diseases from WHO DONs in a near real-time setting. For this purpose, we formulate Guided Epidemiological Line List (GELL), an approach that combines neural word embeddings with information extracted from dependency parse-trees at the sentence level to extract line list features. (iii) Finally, for the third problem, we aim to characterize diseases automatically from HealthMap corpus using a disease-specific word embedding model which were subsequently evaluated against human curated ones for accuracies. / Ph. D.
97

Duplicate Detection and Text Classification on Simplified Technical English / Dublettdetektion och textklassificering på Förenklad Teknisk Engelska

Lund, Max January 2019 (has links)
This thesis investigates the most effective way of performing classification of text labels and clustering of duplicate texts in technical documentation written in Simplified Technical English. Pre-trained language models from transformers (BERT) were tested against traditional methods such as tf-idf with cosine similarity (kNN) and SVMs on the classification task. For detecting duplicate texts, vector representations from pre-trained transformer and LSTM models were tested against tf-idf using the density-based clustering algorithms DBSCAN and HDBSCAN. The results show that traditional methods are comparable to pre-trained models for classification, and that using tf-idf vectors with a low distance threshold in DBSCAN is preferable for duplicate detection.
98

O uso de recursos linguísticos para mensurar a semelhança semântica entre frases curtas através de uma abordagem híbrida

Silva, Allan de Barcelos 14 December 2017 (has links)
Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2018-04-04T11:46:54Z No. of bitstreams: 1 Allan de Barcelos Silva_.pdf: 2298557 bytes, checksum: dc876b1dd44e7a7095219195e809bb88 (MD5) / Made available in DSpace on 2018-04-04T11:46:55Z (GMT). No. of bitstreams: 1 Allan de Barcelos Silva_.pdf: 2298557 bytes, checksum: dc876b1dd44e7a7095219195e809bb88 (MD5) Previous issue date: 2017-12-14 / Nenhuma / Na área de Processamento de Linguagem Natural, a avaliação da similaridade semântica textual é considerada como um elemento importante para a construção de recursos em diversas frentes de trabalho, tais como a recuperação de informações, a classificação de textos, o agrupamento de documentos, as aplicações de tradução, a interação através de diálogos, entre outras. A literatura da área descreve aplicações e técnicas voltadas, em grande parte, para a língua inglesa. Além disso, observa-se o uso prioritário de recursos probabilísticos, enquanto os aspectos linguísticos são utilizados de forma incipiente. Trabalhos na área destacam que a linguística possui um papel fundamental na avaliação de similaridade semântica textual, justamente por ampliar o potencial dos métodos exclusivamente probabilísticos e evitar algumas de suas falhas, que em boa medida são resultado da falta de tratamento mais aprofundado de aspectos da língua. Este contexto é potencializado no tratamento de frases curtas, que consistem no maior campo de utilização das técnicas de similaridade semântica textual, pois este tipo de sentença é composto por um conjunto reduzido de informações, diminuindo assim a capacidade de tratamento probabilístico eficiente. Logo, considera-se vital a identificação e aplicação de recursos a partir do estudo mais aprofundado da língua para melhor compreensão dos aspectos que definem a similaridade entre sentenças. O presente trabalho apresenta uma abordagem para avaliação da similaridade semântica textual em frases curtas no idioma português brasileiro. O principal diferencial apresentado é o uso de uma abordagem híbrida, na qual tanto os recursos de representação distribuída como os aspectos léxicos e linguísticos são utilizados. Para a consolidação do estudo, foi definida uma metodologia que permite a análise de diversas combinações de recursos, possibilitando a avaliação dos ganhos que são introduzidos com a ampliação de aspectos linguísticos e também através de sua combinação com o conhecimento gerado por outras técnicas. A abordagem proposta foi avaliada com relação a conjuntos de dados conhecidos na literatura (evento PROPOR 2016) e obteve bons resultados. / One of the areas of Natural language processing (NLP), the task of assessing the Semantic Textual Similarity (STS) is one of the challenges in NLP and comes playing an increasingly important role in related applications. The STS is a fundamental part of techniques and approaches in several areas, such as information retrieval, text classification, document clustering, applications in the areas of translation, check for duplicates and others. The literature describes the experimentation with almost exclusive application in the English language, in addition to the priority use of probabilistic resources, exploring the linguistic ones in an incipient way. Since the linguistic plays a fundamental role in the analysis of semantic textual similarity between short sentences, because exclusively probabilistic works fails in some way (e.g. identification of far or close related sentences, anaphora) due to lack of understanding of the language. This fact stems from the few non-linguistic information in short sentences. Therefore, it is vital to identify and apply linguistic resources for better understand what make two or more sentences similar or not. The current work presents a hybrid approach, in which are used both of distributed, lexical and linguistic aspects for an evaluation of semantic textual similarity between short sentences in Brazilian Portuguese. We evaluated proposed approach with well-known and respected datasets in the literature (PROPOR 2016) and obtained good results.
99

Aplicações da teoria dos espaços coarse a espaços de Banach e grupos topológicos / Applications of coarse spaces theory to Banach spaces and topological groups

Garcia, Denis de Assis Pinto 24 June 2019 (has links)
Este trabalho é uma contribuição ao estudo da geometria de larga escala de espaços de Banach e de grupos topológicos. Embora esses dois campos sejam tradicionalmente estudados de forma independente, em 2017, Christian Rosendal mostrou que eles podem ser encarados como faces distintas de algo maior: a geometria grosseira de grupos topológicos. Uma ferramenta essencial para o desenvolvimento dessa nova abordagem é a noção de estrutura coarse, introduzida por John Roe em 2003, a qual pode ser vista como a contraparte de larga escala do conceito de estrutura uniforme. Por essa razão, os capítulos iniciais da dissertação destinam-se a apresentar uma introdução elementar à teoria dos espaços uniformes e dos espaços coarse, destacando os conceitos-chave para a compreensão dos demais capítulos e conferindo particular atenção ao estudo de uniformidades e estruturas coarse associadas a grupos topológicos, dentre as quais são enfatizadas as estruturas uniforme à esquerda e coarse à esquerda de um grupo topológico. No capítulo 5, são discutidos resultados recentes de Christian Rosendal acerca da existência de mergulhos uniformes e mergulhos grosseiros entre espaços de Banach. Dois dos mais importantes afirmam que, se existir uma função f uniformemente contínua e não colapsada entre os espaços de Banach (X, ||·||_X) e (E, ||·||_E), então, para todo p em [1, + infty[, existirá um mergulho uniforme de (X, ||·||_X) em (l_p(E), ||·||_p) o qual é, também, um mergulho grosseiro, e que, se f for, também, limitada, existirá um mergulho grosseiro uniformemente contínuo de (X, ||·||_X) em (ExE, ||·||_(ExE)). Já no capítulo 6, estuda-se a classe das estruturas coarse invariantes à esquerda sobre grupos. Inicialmente, mostra-se como uma estrutura coarse invariante à esquerda em um grupo (G, · ) pode ser descrita em função de um certo ideal sobre G, e vice-versa. Em seguida, utiliza-se esse resultado para caracterizar a estrutura coarse à esquerda E_L de um grupo topológico (G, · , T) em termos da coleção dos conjuntos grosseiramente limitados em (G, E_L) e, com isso, provar que a estrutura coarse à esquerda associada ao grupo aditivo de um espaço normado coincide com a estrutura coarse limitada induzida pela norma. / This work is a contribution to the study of large-scale geometry of Banach spaces and topological groups. Although these two fields are traditionally studied independently, in 2017, Christian Rosendal showed they can be regarded as different aspects of a more general theory: the coarse geometry of topological groups. An essential tool for the development of this new approach is the notion of coarse structure, introduced by John Roe in 2003, which can be seen as the large-scale counterpart of the concept of uniform structure. For this reason, the initial chapters of this work intend to present an elementary introduction to both uniform and coarse spaces theory, highlighting the key concepts for the understanding of the other chapters and paying particular attention to the study of uniform and coarse structures associated with topological groups, and, mainly, to the left-uniform and the left-coarse structures of a topological group. In Chapter 5, we discuss Rosendal\'s recent results on the existence of uniform and coarse embeddings between Banach spaces. Two of the most important state that, if there is an uncollapsed uniformly continuous function f between the Banach spaces (X, ||·||_X) and (E, ||·||_E), then, for all p in [1, + infty[, (X, ||·||_X) admits a simultaneously uniform and coarse embedding into (l_p(E), ||·||_p), and that, if, in addition, we assume that f maps into a bounded set, then (X, ||·||_X) also admits a uniformly continuous coarse embedding into (ExE, ||·||_(ExE)). On the other hand, in chapter 6, we focus our attention on the class of left-invariant coarse structures on groups. In the first section, we show how a left-invariant coarse structure on a group (G, · ) can be described in terms of a certain ideal on G, and vice versa. After that, we use this result to characterize the left-coarse structure E_L of a topological group (G, · , T) in terms of the collection of the coarsely bounded sets of (G, E_L) and, with this, we prove that the left-coarse structure associated with the additive group of a normed space is simply the bounded coarse structure induced by its norm.
100

Constrained measurement systems of low-dimensional signals

Yap, Han Lun 20 December 2012 (has links)
The object of this thesis is the study of constrained measurement systems of signals having low-dimensional structure using analytic tools from Compressed Sensing (CS). Realistic measurement systems usually have architectural constraints that make them differ from their idealized, well-studied counterparts. Nonetheless, these measurement systems can exploit structure in the signals that they measure. Signals considered in this research have low-dimensional structure and can be broken down into two types: static or dynamic. Static signals are either sparse in a specified basis or lying on a low-dimensional manifold (called manifold-modeled signals). Dynamic signals, exemplified as states of a dynamical system, either lie on a low-dimensional manifold or have converged onto a low-dimensional attractor. In CS, the Restricted Isometry Property (RIP) of a measurement system ensures that distances between all signals of a certain sparsity are preserved. This stable embedding ensures that sparse signals can be distinguished one from another by their measurements and therefore be robustly recovered. Moreover, signal-processing and data-inference algorithms can be performed directly on the measurements instead of requiring a prior signal recovery step. Taking inspiration from the RIP, this research analyzes conditions on realistic, constrained measurement systems (of the signals described above) such that they are stable embeddings of the signals that they measure. Specifically, this thesis focuses on four different types of measurement systems. First, we study the concentration of measure and the RIP of random block diagonal matrices that represent measurement systems constrained to make local measurements. Second, we study the stable embedding of manifold-modeled signals by existing CS matrices. The third part of this thesis deals with measurement systems of dynamical systems that produce time series observations. While Takens' embedding result ensures that this time series output can be an embedding of the dynamical systems' states, our research establishes that a stronger stable embedding result is possible under certain conditions. The final part of this thesis is the application of CS ideas to the study of the short-term memory of neural networks. In particular, we show that the nodes of a recurrent neural network can be a stable embedding of sparse input sequences.

Page generated in 0.0754 seconds