• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 585
  • 295
  • 86
  • 38
  • 15
  • 11
  • 6
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 1180
  • 808
  • 410
  • 291
  • 285
  • 277
  • 203
  • 196
  • 190
  • 140
  • 121
  • 119
  • 119
  • 117
  • 116
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
291

The Obstacles to Implementing Supervised Injection Services in Ottawa, Ontario

Simpson, Laura January 2017 (has links)
The current opioid crisis has, among other things, resulted in soaring rates of fatal overdose across Canada, prompting officials to turn to harm reduction in hopes of combatting the epidemic. The Coroners Service of British Columbia issued a statement in March 2017 reporting an 80% increase in the number of deaths resulting from illicit drug use in 2016 from 2015 (Coroners Service of British Columbia, 2017). Despite the abundance of evidence demonstrating the effectiveness of supervised injection services (SIS) in Canada and worldwide, the implementation of this intervention has remained highly controversial, particularly in Ottawa. Guided by Michel Foucault’s theory of governmentality, this thesis explores the obstacles hindering the implementation of supervised injection services in Ottawa, Ontario. Through eight qualitative semi-structured interviews with front-line workers of harm reduction programs, this thesis identifies and explores several obstacles to the implementation of SIS, primarily bureaucratic obstacles stemming from the enactment of the Respect for Communities Act (2015).
292

Handling imperfections for multimodal image annotation / Gestion des imperfections pour l’annotation multimodale d’images

Znaidia, Amel 11 February 2014 (has links)
La présente thèse s’intéresse à l’annotation multimodale d’images dans le contexte des médias sociaux. Notre objectif est de combiner les modalités visuelles et textuelles (tags) afin d’améliorer les performances d’annotation d’images. Cependant, ces tags sont généralement issus d’une indexation personnelle, fournissant une information imparfaite et partiellement pertinente pour un objectif de description du contenu sémantique de l’image. En outre, en combinant les scores de prédiction de différents classifieurs appris sur les différentes modalités, l’annotation multimodale d’image fait face à leurs imperfections: l’incertitude, l’imprécision et l’incomplétude. Dans cette thèse, nous considérons que l’annotation multimodale d’image est soumise à ces imperfections à deux niveaux : niveau représentation et niveau décision. Inspiré de la théorie de fusion de l’information, nous concentrons nos efforts dans cette thèse sur la définition, l’identification et la prise en compte de ces aspects d’imperfections afin d’améliorer l’annotation d’images. / This thesis deals with multimodal image annotation in the context of social media. We seek to take advantage of textual (tags) and visual information in order to enhance the image annotation performances. However, these tags are often noisy, overly personalized and only a few of them are related to the semantic visual content of the image. In addition, when combining prediction scores from different classifiers learned on different modalities, multimodal image annotation faces their imperfections (uncertainty, imprecision and incompleteness). Consequently, we consider that multimodal image annotation is subject to imperfections at two levels: the representation and the decision. Inspired from the information fusion theory, we focus in this thesis on defining, identifying and handling imperfection aspects in order to improve image annotation.
293

Hypothesis testing and feature selection in semi-supervised data

Sechidis, Konstantinos January 2015 (has links)
A characteristic of most real world problems is that collecting unlabelled examples is easier and cheaper than collecting labelled ones. As a result, learning from partially labelled data is a crucial and demanding area of machine learning, and extending techniques from fully to partially supervised scenarios is a challenging problem. Our work focuses on two types of partially labelled data that can occur in binary problems: semi-supervised data, where the labelled set contains both positive and negative examples, and positive-unlabelled data, a more restricted version of partial supervision where the labelled set consists of only positive examples. In both settings, it is very important to explore a large number of features in order to derive useful and interpretable information about our classification task, and select a subset of features that contains most of the useful information. In this thesis, we address three fundamental and tightly coupled questions concerning feature selection in partially labelled data; all three relate to the highly controversial issue of when does additional unlabelled data improve performance in partially labelled learning environments and when does not. The first question is what are the properties of statistical hypothesis testing in such data? Second, given the widespread criticism of significance testing, what can we do in terms of effect size estimation, that is, quantification of how strong the dependency between feature X and the partially observed label Y? Finally, in the context of feature selection, how well can features be ranked by estimated measures, when the population values are unknown? The answers to these questions provide a comprehensive picture of feature selection in partially labelled data. Interesting applications include for estimation of mutual information quantities, structure learning in Bayesian networks, and investigation of how human-provided prior knowledge can overcome the restrictions of partial labelling. One direct contribution of our work is to enable valid statistical hypothesis testing and estimation in positive-unlabelled data. Focusing on a generalised likelihood ratio test and on estimating mutual information, we provide five key contributions. (1) We prove that assuming all unlabelled examples are negative cases is sufficient for independence testing, but not for power analysis activities. (2) We suggest a new methodology that compensates this and enables power analysis, allowing sample size determination for observing an effect with a desired power by incorporating user’s prior knowledge over the prevalence of positive examples. (3) We show a new capability, supervision determination, which can determine a-priori the number of labelled examples the user must collect before being able to observe a desired statistical effect. (4) We derive an estimator of the mutual information in positive-unlabelled data, and its asymptotic distribution. (5) Finally, we show how to rank features with and without prior knowledge. Also we derive extensions of these results to semi-supervised data. In another extension, we investigate how we can use our results for Markov blanket discovery in partially labelled data. While there are many different algorithms for deriving the Markov blanket of fully supervised nodes, the partially labelled problem is far more challenging, and there is a lack of principled approaches in the literature. Our work constitutes a generalization of the conditional tests of independence for partially labelled binary target variables, which can handle the two main partially labelled scenarios: positive-unlabelled and semi-supervised. The result is a significantly deeper understanding of how to control false negative errors in Markov Blanket discovery procedures and how unlabelled data can help. Finally, we present how our results can be used for information theoretic feature selection in partially labelled data. Our work extends naturally feature selection criteria suggested for fully-supervised data, to partially labelled scenarios. These criteria can capture both the relevancy and redundancy of the features and can be used for semi-supervised and positive-unlabelled data.
294

Gestion supervisée de systèmes étendus à retards variables : cas des réseaux hydrographiques / Supervisory control of large scale system with varying time delay : hydrophical network case study

Nouasse, Houda 04 March 2015 (has links)
De part et d’autre de la Terre, on observe de plus en plus de phénomènes naturels dévastateurs, parmi lesquels les inondations constituent l’une des catastrophes les plus fréquentes. Ces dernières décennies d’importantes inondations ont été induites par les crues de rivières. Ces crues, dues à des pluies excessives ou aux eaux de ruissellement, causent sans cesse des pertes de vies humaines et des dégâts matériels importants. Pour remédier à ces problèmes, les réseaux hydrographiques sont de plus en plus équipés de moyens de détection de crues. Un facteur essentiel à la gestion de tels phénomènes est la réactivité. En effet, les gestionnaires des réseaux hydrographiques, dans ce genre de situation, doivent prendre rapidement des décisions importantes dans un contexte incertain, car la plupart de ces crues sont le fruit de phénomènes climatiques brusques, dont l’ampleur est difficile à évaluer avec précision. Nous proposons, dans ce mémoire, une méthode de gestion des crues dans des réseaux hydrographiques équipés de zones inondables contrôlées par des portes gravitationnelles. Dans un premier temps, nous avons modélisé notre méthode de gestion à l’aide d’un réseau de transport statique. Dans un second temps, nous l’avons enrichi en utilisant les réseaux de transport à retards dans le but de prendre en compte les temps de déplacement de la ressource gérée. Afin de pallier le problème de la taille importante des réseaux de transport à retards, nous avons élaboré un mécanisme de substitution combinant un réseau de transport statique réduit et une matrice de temporisation. De plus, ce mécanisme autorise la prise en compte des temps de transfert variables dépendant des débits, sans modification ni du réseau de transport, ni de la structure de la matrice de temporisation. Ce mécanisme permet donc une gestion simplifiée des temps de transferts, variables ou non. Avec ce mécanisme, l’évaluation du flot maximal à coût minimum, nous a permis, suivant les stratégies de gestion considérées, de consigner l’ouverture des portes des zones inondables afin d’écrêter la crue mais aussi afin de restituer cette eau stockée au moment opportun. Finalement, afin d’évaluer les apports de cette gestion, la méthode a été appliquée sur un cas d’étude basée sur un tronçon de rivière équipé de trois zones inondables et modélisé à l’aide de simulateurs hydrauliques combinant les approches de modélisation 1D et 2D. Les résultats de simulation obtenus ont montré que l’approche proposée permettait de réduire de manière significative les inondations en aval des cours d’eau. / On either side of the Earth, we observe more and more devastating natural phenomena. Amon these phenomena, floods are one of the most frequent and devastating natural disasters. During these last decades extensive flooding were caused by the flood of rivers. These floods due to excessive rainfall or runoff induce invariably the loss of human lives and material damages. To overcome these problems, water systems are increasingly equipped with means for detecting floods. A key factor in the management of such phenomena is responsiveness. Indeed, managers of river systems, faced to this kind of situation should quickly take important decisions in an uncertain context, as most of these floods are induced by abrupt climate events, whose magnitude is difficult to assess accuracy. We propose in this dissertation, a method of flood management in river systems equipped with flood zones controlled by gravitational gates. At first, we modeled our management method using a static transportation network. In a second step, we enriched it by using transportation networks with delays in order to take into account the travel time of the managed resource. The main difficulty of transportation networks with delays is their oversize. To overcome this problem, we developed an alternative mechanism combining a static reduced transportation network with a temporization matrix. Furthermore, this mechanism allows the consideration of variable time transfer depending on flows, without modification either on the transportation network, or on the structure of the temporization matrix. This mechanism allows simplified management of the transfer times, variable or not. With this mechanism, the evaluation of the minimum cost maximum flow allowed us, according to the management strategies considered, to compute the gate opening for floodplains in order to mitigate the flood but also to restore the water stored at the relevant time. Finally, to evaluate the contributions of this management, the method was applied to a case study based on a section of river equipped with three flood control reservoirs areas modeled using hydraulic simulators combining 1D and 2D models. The simulation results showed that the proposed approach allowed reducing significantly the floods downstream watercourses.
295

Predicting "Essential" Genes in Microbial Genomes: A Machine Learning Approach to Knowledge Discovery in Microbial Genomic Data

Palaniappan, Krishnaveni 01 January 2010 (has links)
Essential genes constitute the minimal gene set of an organism that is indispensable for its survival under most favorable conditions. The problem of accurately identifying and predicting genes essential for survival of an organism has both theoretical and practical relevance in genome biology and medicine. From a theoretical perspective it provides insights in the understanding of the minimal requirements for cellular life and plays a key role in the emerging field of synthetic biology; from a practical perspective, it facilitates efficient identification of potential drug targets (e.g., antibiotics) in novel pathogens. However, characterizing essential genes of an organism requires sophisticated experimental studies that are expensive and time consuming. The goal of this research study was to investigate machine learning methods to accurately classify/predict "essential genes" in newly sequenced microbial genomes based solely on their genomic sequence data. This study formulates the predication of essential genes problem as a binary classification problem and systematically investigates applicability of three different supervised classification methods for this task. In particular, Decision Tree (DT), Support Vector Machine (SVM), and Artificial Neural Network (ANN) based classifier models were constructed and trained on genomic features derived solely from gene sequence data of 14 experimentally validated microbial genomes whose essential genes are known. A set of 52 relevant genomic sequence derived features (including gene and protein sequence features, protein physio-chemical features and protein sub-cellular features) was used as input for the learners to learn the classifier models. The training and test datasets used in this study reflected between-class imbalance (i.e. skewed majority class vs. minority class) that is intrinsic to this data domain and essential genes prediction problem. Two imbalance reduction techniques (homology reduction and random under sampling of 50% of the majority class) were devised without artificially balancing the datasets and compromising classifier generalizability. The classifier models were trained and evaluated using 10-fold stratified cross validation strategy on both the full multi-genome datasets and its class imbalance reduced variants to assess their predictive ability of discriminating essential genes from non-essential genes. In addition, the classifiers were also evaluated using a novel blind testing strategy, called LOGO (Leave-One-Genome-Out) and LOTO (Leave-One-Taxon group-Out) tests on carefully constructed held-out datasets (both genome-wise (LOGO) and taxonomic group-wise (LOTO)) that were not used in training of the classifier models. Prediction performance metrics, accuracy, sensitivity, specificity, precision and area under the Receiver Operating Characteristics (AU-ROC) were assessed for DT, SVM and ANN derived models. Empirical results from 10 X 10-fold stratified cross validation, Leave-One-Genome-Out (LOGO) and Leave-One-Taxon group-Out (LOTO) blind testing experiments indicate SVM and ANN based models perform better than Decision Tree based models. On 10 X 10-fold cross validations, the SVM based models achieved an AU-ROC score of 0.80, while ANN and DT achieved 0.79 and 0.68 respectively. Both LOGO (genome-wise) and LOTO (taxonwise) blind tests revealed the generalization extent of these classifiers across different genomes and taxonomic orders. This study empirically demonstrated the merits of applying machine learning methods to predict essential genes in microbial genomes by using only gene sequence and features derived from it. It also demonstrated that it is possible to predict essential genes based on features derived from gene sequence without using homology information. LOGO and LOTO Blind test results reveal that the trained classifiers do generalize across genomes and taxonomic boundaries and provide first critical estimate of predictive performance on microbial genomes. Overall, this study provides a systematic assessment of applying DT, ANN and SVM to this prediction problem. An important potential application of this study will be to apply the resultant predictive model/approach and integrate it as a genome annotation pipeline method for comparative microbial genome and metagenome analysis resources such as the Integrated Microbial Genome Systems (IMG and IMG/M).
296

Estágio supervisionado em educação física : tempo de aprender ou simples cumprimento da lei? / Supervised internship in physical education : time to learn or simple law enforcement?

Cristovão, Silvio César, 1970- 12 May 2014 (has links)
Orientador: Eliana Ayoub / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Educação / Made available in DSpace on 2018-08-26T17:53:26Z (GMT). No. of bitstreams: 1 Cristovao_SilvioCesar_M.pdf: 2364640 bytes, checksum: c483985da96dca21ab6c12b3bb5a4284 (MD5) Previous issue date: 2014 / Resumo: Este estudo tem como objeto de investigação o estágio supervisionado obrigatório do Curso de Licenciatura em Educação Física da Universidade Estadual de Campinas/Unicamp e objetiva analisar e compreender o papel do estágio no processo de formação de professores, tomando como referência as relações cotidianas vivenciadas pelo estagiário no contexto escolar. A metodologia de pesquisa desenvolveu-se combinando um estudo teórico acerca da temática do estágio com a realização de uma pesquisa de campo com estagiários do referido curso. O grupo de estagiários que esteve envolvido nesse processo foi formado por sete alunos que participaram da disciplina Estágio Supervisionado, desenvolvida na Faculdade de Educação da Unicamp. Foram realizadas entrevistas com os sujeitos, as quais foram gravadas e transcritas para fins de análise. Além das entrevistas, os estagiários foram incentivados a produzir narrativas escritas sobre episódios vividos durante o trabalho desenvolvido junto aos professores supervisores de estágio. A questão trazida desde o título deste estudo, "estágio supervisionado em educação física: tempo de aprender ou simples cumprimento da lei?", foi um disparador para que, por meio da análise das narrativas orais e escritas dos sujeitos, refletíssemos sobre o estágio no cotidiano da formação docente. Os resultados trouxeram críticas dos sujeitos no que se refere à realização do estágio no final do curso e apontaram que, tanto as práticas de estágio desenvolvidas pelos estagiários, quanto as práticas pedagógicas de supervisores e orientadores, são elementos que influenciam na qualidade da formação e, portanto, necessitam ser tratadas conjuntamente na relação entre a universidade e a escola. Nesse sentido, fica clara a necessidade de encará-las como parte indissociável do tornar-se professor durante o processo de formação / Abstract: This study has as its object the research required supervised internship course degree in physical education from the Universidade Estadual de Campinas/Unicamp and has as objective to analyze and understand the role of the stage in the process of teacher training, taking as a reference the daily relations experienced by the trainee in the school context. The research methodology was developed by combining a theoretical study about the theme of the internship with the realization of a field research with trainees of the course. The group of trainees who were involved in this process was formed by seven students who participated in the discipline supervised internship, developed at the Faculty of Education at Unicamp. Interviews were conducted with the subjects, which were recorded and transcribed for analysis purposes. In addition to the interviews, the trainees were encouraged to produce narratives written about episodes experienced during the work developed by the teachers training course supervisors. The question brought since the title of this study, "supervised internship in physical education: time to learn or simple law enforcement?", was a trigger for that, through the analysis of the oral and written narratives of the subjects, for about the internship in the everyday life of teacher training. The results brought criticism of subjects with regard to the realization of the stage at the end of the course and pointed out that both the internship practices developed by interns, as pedagogical practices of supervisors and mentors, are elements that influence the quality of training and therefore need to be treated together in the relationship between the University and the school. In this sense, it is clear the need to face them as an integral part of becoming teacher during the process of teacher training / Mestrado / Educação, Conhecimento, Linguagem e Arte / Mestre em Educação
297

Técnicas para o problema de dados desbalanceados em classificação hierárquica / Techniques for the problem of imbalanced data in hierarchical classification

Victor Hugo Barella 24 July 2015 (has links)
Os recentes avanços da ciência e tecnologia viabilizaram o crescimento de dados em quantidade e disponibilidade. Junto com essa explosão de informações geradas, surge a necessidade de analisar dados para descobrir conhecimento novo e útil. Desse modo, áreas que visam extrair conhecimento e informações úteis de grandes conjuntos de dados se tornaram grandes oportunidades para o avanço de pesquisas, tal como o Aprendizado de Máquina (AM) e a Mineração de Dados (MD). Porém, existem algumas limitações que podem prejudicar a acurácia de alguns algoritmos tradicionais dessas áreas, por exemplo o desbalanceamento das amostras das classes de um conjunto de dados. Para mitigar tal problema, algumas alternativas têm sido alvos de pesquisas nos últimos anos, tal como o desenvolvimento de técnicas para o balanceamento artificial de dados, a modificação dos algoritmos e propostas de abordagens para dados desbalanceados. Uma área pouco explorada sob a visão do desbalanceamento de dados são os problemas de classificação hierárquica, em que as classes são organizadas em hierarquias, normalmente na forma de árvore ou DAG (Direct Acyclic Graph). O objetivo deste trabalho foi investigar as limitações e maneiras de minimizar os efeitos de dados desbalanceados em problemas de classificação hierárquica. Os experimentos realizados mostram que é necessário levar em consideração as características das classes hierárquicas para a aplicação (ou não) de técnicas para tratar problemas dados desbalanceados em classificação hierárquica. / Recent advances in science and technology have made possible the data growth in quantity and availability. Along with this explosion of generated information, there is a need to analyze data to discover new and useful knowledge. Thus, areas for extracting knowledge and useful information in large datasets have become great opportunities for the advancement of research, such as Machine Learning (ML) and Data Mining (DM). However, there are some limitations that may reduce the accuracy of some traditional algorithms of these areas, for example the imbalance of classes samples in a dataset. To mitigate this drawback, some solutions have been the target of research in recent years, such as the development of techniques for artificial balancing data, algorithm modification and new approaches for imbalanced data. An area little explored in the data imbalance vision are the problems of hierarchical classification, in which the classes are organized into hierarchies, commonly in the form of tree or DAG (Direct Acyclic Graph). The goal of this work aims at investigating the limitations and approaches to minimize the effects of imbalanced data with hierarchical classification problems. The experimental results show the need to take into account the features of hierarchical classes when deciding the application of techniques for imbalanced data in hierarchical classification.
298

Interpretação de clusters gerados por algoritmos de clustering hierárquico / Interpreting clusters generated by hierarchical clustering algorithms

Jean Metz 04 August 2006 (has links)
O processo de Mineração de Dados (MD) consiste na extração automática de padrões que representam o conhecimento implícito em grandes bases de dados. Em geral, a MD pode ser classificada em duas categorias: preditiva e descritiva. Tarefas da primeira categoria, tal como a classificação, realizam inferências preditivas sobre os dados enquanto que tarefas da segunda categoria, tal como o clustering, exploram o conjunto de dados em busca de propriedades que o descrevem. Diferentemente da classificação, que analisa exemplos rotulados, o clustering utiliza exemplos para os quais o rótulo da classe não é previamente conhecido. Nessa tarefa, agrupamentos são formados de modo que exemplos de um mesmo cluster apresentam alta similaridade, ao passo que exemplos em clusters diferentes apresentam baixa similaridade. O clustering pode ainda facilitar a organização de clusters em uma hierarquia de agrupamentos, na qual são agrupados eventos similares, criando uma taxonomia que pode simplificar a interpretação de clusters. Neste trabalho, é proposto e desenvolvido um módulo de aprendizado não-supervisionado, que agrega algoritmos de clustering hierárquico e ferramentas de análise de clusters para auxiliar o especialista de domínio na interpretação dos resultados do clustering. Uma vez que o clustering hierárquico agrupa exemplos de acordo com medidas de similaridade e organiza os clusters em uma hierarquia, o usuário/especialista pode analisar e explorar essa hierarquia de agrupamentos em diferentes níveis para descobrir conceitos descritos por essa estrutura. O módulo proposto está integrado em um sistema maior, em desenvolvimento no Laboratório de Inteligência Computacional ? LABIC ?, que contempla todas as etapas do processo de MD, desde o pré-processamento de dados ao pós-processamento de conhecimento. Para avaliar o módulo proposto e seu uso para descoberta de conceitos a partir da estrutura hierárquica de clusters, foram realizados diversos experimentos sobre conjuntos de dados naturais, assim como um estudo de caso utilizando um conjunto de dados real. Os resultados mostram a viabilidade da metodologia proposta para interpretação dos clusters, apesar da complexidade do processo ser dependente das características do conjunto de dados. / The Data Mining (DM) process consists of the automated extraction of patterns representing knowledge implicitly stored in large databases. In general, DM tasks can be classified into two categories: predictive and descriptive. Tasks in the first category, such as classification and prediction, perform inference on the data in order to make predictions, while tasks in the second category, such as clustering, characterize the general properties of the data. Unlike classification and prediction, which analyze class-labeled data objects, clustering analyses data objects without a known class-label. Clusters of objects are formed so that objects that are in the same cluster have a close similarity among them, but are very dissimilar to objects in other clusters. Clustering can also facilitate the organization of clusters into a hierarchy of clusters that group similar events together. This taxonomy formation can facilitate interpretation of clusters. In this work, we propose and develop tools to deal with this task by implementing a module which comprises hierarchical clustering algorithms and several cluster analysis tools, aiming to help the domain specialist to interpret the clustering results. Once clusters group objects based on similarity measures which are organized into a hierarchy, the user/specialist is able to carry out an analysis and exploration of the agglomeration hierarchy at different levels of the hierarchy in order to discover concepts described by this structure. The proposed module is integrated into a large system under development by researchers from the Computational Intelligence Laboratory ? LABIC ?- which contemplates all the DM process steps, from data pre-processing to knowledge post-processing. To evaluate the implemented module and its use to discover concepts from the hierarchical structure of clusters, several experiments on natural databases were carried out as well as a case study using a real database. Results show the viability of the proposed methodology although the process could be complex depending on the characteristics of the database.
299

Expansão de recursos para análise de sentimentos usando aprendizado semi-supervisionado / Extending sentiment analysis resources using semi-supervised learning

Henrico Bertini Brum 23 March 2018 (has links)
O grande volume de dados que temos disponíveis em ambientes virtuais pode ser excelente fonte de novos recursos para estudos em diversas tarefas de Processamento de Linguagem Natural, como a Análise de Sentimentos. Infelizmente é elevado o custo de anotação de novos córpus, que envolve desde investimentos financeiros até demorados processos de revisão. Nossa pesquisa propõe uma abordagem de anotação semissupervisionada, ou seja, anotação automática de um grande córpus não anotado partindo de um conjunto de dados anotados manualmente. Para tal, introduzimos o TweetSentBR, um córpus de tweets no domínio de programas televisivos que possui anotação em três classes e revisões parciais feitas por até sete anotadores. O córpus representa um importante recurso linguístico de português brasileiro, e fica entre os maiores córpus anotados na literatura para classificação de polaridades. Além da anotação manual do córpus, realizamos a implementação de um framework de aprendizado semissupervisionado que faz uso de dados anotados e, de maneira iterativa, expande o mesmo usando dados não anotados. O TweetSentBR, que possui 15:000 tweets anotados é assim expandido cerca de oito vezes. Para a expansão, foram treinados modelos de classificação usando seis classificadores de polaridades, assim como foram avaliados diferentes parâmetros e representações a fim de obter um córpus confiável. Realizamos experimentos gerando córpus expandidos por cada classificador, tanto para a classificação em três polaridades (positiva, neutra e negativa) quanto para classificação binária. Avaliamos os córpus gerados usando um conjunto de held-out e comparamos a FMeasure da classificação usando como treinamento os córpus anotados manualmente e semiautomaticamente. O córpus semissupervisionado que obteve os melhores resultados para a classificação em três polaridades atingiu 62;14% de F-Measure média, superando a média obtida com as avaliações no córpus anotado manualmente (61;02%). Na classificação binária, o melhor córpus expandido obteve 83;11% de F1-Measure média, superando a média obtida na avaliação do córpus anotado manualmente (79;80%). Além disso, simulamos nossa expansão em córpus anotados da literatura, medindo o quão corretas são as etiquetas anotadas semi-automaticamente. Nosso melhor resultado foi na expansão de um córpus de reviews de produtos que obteve FMeasure de 93;15% com dados binários. Por fim, comparamos um córpus da literatura obtido por meio de supervisão distante e nosso framework semissupervisionado superou o primeiro na classificação de polaridades binária em cross-domain. / The high volume of data available in the Internet can be a good resource for studies of several tasks in Natural Language Processing as in Sentiment Analysis. Unfortunately there is a high cost for the annotation of new corpora, involving financial support and long revision processes. Our work proposes an approach for semi-supervised labeling, an automatic annotation of a large unlabeled set of documents starting from a manually annotated corpus. In order to achieve that, we introduced TweetSentBR, a tweet corpora on TV show programs domain with annotation for 3-point (positive, neutral and negative) sentiment classification partially reviewed by up to seven annotators. The corpus is an important linguistic resource for Brazilian Portuguese language and it stands between the biggest annotated corpora for polarity classification. Beyond the manual annotation, we implemented a semi-supervised learning based framework that uses this labeled data and extends it using unlabeled data. TweetSentBR corpus, containing 15:000 documents, had its size augmented in eight times. For the extending process, we trained classification models using six polarity classifiers, evaluated different parameters and representation schemes in order to obtain the most reliable corpora. We ran experiments generating extended corpora for each classifier, both for 3-point and binary classification. We evaluated the generated corpora using a held-out subset and compared the obtained F-Measure values with the manually and the semi-supervised annotated corpora. The semi-supervised corpus that obtained the best values for 3-point classification achieved 62;14% on average F-Measure, overcoming the results obtained by the same classification with the manually annotated corpus (61;02%). On binary classification, the best extended corpus achieved 83;11% on average F-Measure, overcoming the results on the manually corpora (79;80%). Furthermore, we simulated the extension of labeled corpora in literature, measuring how well the semi-supervised annotation works. Our best results were in the extension of a product review corpora, achieving 93;15% on F1-Measure. Finally, we compared a literature corpus which was labeled by using distant supervision with our semi-supervised corpus, and this overcame the first in binary polarity classification on cross-domain data.
300

Generalized Domain Adaptation for Visual Domains

January 2020 (has links)
abstract: Humans have a great ability to recognize objects in different environments irrespective of their variations. However, the same does not apply to machine learning models which are unable to generalize to images of objects from different domains. The generalization of these models to new data is constrained by the domain gap. Many factors such as image background, image resolution, color, camera perspective and variations in the objects are responsible for the domain gap between the training data (source domain) and testing data (target domain). Domain adaptation algorithms aim to overcome the domain gap between the source and target domains and learn robust models that can perform well across both the domains. This thesis provides solutions for the standard problem of unsupervised domain adaptation (UDA) and the more generic problem of generalized domain adaptation (GDA). The contributions of this thesis are as follows. (1) Certain and Consistent Domain Adaptation model for closed-set unsupervised domain adaptation by aligning the features of the source and target domain using deep neural networks. (2) A multi-adversarial deep learning model for generalized domain adaptation. (3) A gating model that detects out-of-distribution samples for generalized domain adaptation. The models were tested across multiple computer vision datasets for domain adaptation. The dissertation concludes with a discussion on the proposed approaches and future directions for research in closed set and generalized domain adaptation. / Dissertation/Thesis / Masters Thesis Computer Science 2020

Page generated in 0.1069 seconds