Global ETD Search

1	On Web Taxonomy Integration Zhang, Dell, Lee, Wee Sun 01 1900 (has links) We address the problem of integrating objects from a source taxonomy into a master taxonomy. This problem is not only pervasive on the nowadays web, but also important to the emerging semantic web. A straightforward approach to automating this process would be to train a classifier for each category in the master taxonomy, and then classify objects from the source taxonomy into these categories. In this paper we attempt to use a powerful classification method, Support Vector Machine (SVM), to attack this problem. Our key insight is that the availability of the source taxonomy data could be helpful to build better classifiers in this scenario, therefore it would be beneficial to do transductive learning rather than inductive learning, i.e., learning to optimize classification performance on a particular set of test examples. Noticing that the categorization of the master and source taxonomies often have some semantic overlap, we propose a new method, Cluster Shrinkage (CS), to further enhance the classification by exploiting such implicit knowledge. Our experiments with real-world web data show substantial improvements in the performance of taxonomy integration. / Singapore-MIT Alliance (SMA) web taxonomy integration classification support vector machines transductive learning
2	Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes. Tangirala, Karthik January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / As genomes are sequenced, a major challenge is their annotation -- the identification of genes and regulatory elements, their locations and their functions. For years, it was believed that one gene corresponds to one protein, but the discovery of alternative splicing provided a mechanism for generating different gene transcripts (isoforms) from the same genomic sequence. In the recent years, it has become obvious that a large fraction of genes undergoes alternative splicing. Thus, understanding alternative splicing is a problem of great interest to biologists. Supervised machine learning approaches can be used to predict alternative splicing events at genome level. However, supervised approaches require large amounts of labeled data to produce accurate classifiers. While large amounts of genomic data are produced by the new sequencing technologies, labeling these data can be costly and time consuming. Therefore, semi-supervised learning approaches that can make use of large amounts of unlabeled data, in addition to small amounts of labeled data are highly desirable. In this work, we study the usefulness of a semi-supervised learning approach, co-training, for classifying exons as alternatively spliced or constitutive. The co-training algorithm makes use of two views of the data to iteratively learn two classifiers that can inform each other, at each step, with their best predictions on the unlabeled data. We consider three sets of features for constructing views for the problem of predicting alternatively spliced exons: lengths of the exon of interest and its flanking introns, exonic splicing enhancers (a.k.a., ESE motifs) and intronic regulatory sequences (a.k.a., IRS motifs). Naive Bayes and Support Vector Machine (SVM) algorithms are used as based classifiers in our study. Experimental results show that the usage of the unlabeled data can result in better classifiers as compared to those obtained from the small amount of labeled data alone. In addition to semi-supervised approaches, we also also study the usefulness of graph based transductive learning approaches for predicting alternatively spliced exons. Similar to the semi-supervised learning algorithms, transductive learning algorithms can make use of unlabeled data, together with labeled data, to produce labels for the unlabeled data. However, a classification model that could be used to classify new unlabeled data is not learned in this case. Experimental results show that graph based transductive approaches can make effective use of the unlabeled data. Alternative splicing Co training Semi supervised learning Transductive learning Graph based approach Bioinformatics (0715) Computer Science (0984)
3	Computational Methods for Perceptual Training in Radiology January 2012 (has links) abstract: Medical images constitute a special class of images that are captured to allow diagnosis of disease, and their "correct" interpretation is vitally important. Because they are not "natural" images, radiologists must be trained to visually interpret them. This training process includes implicit perceptual learning that is gradually acquired over an extended period of exposure to medical images. This dissertation proposes novel computational methods for evaluating and facilitating perceptual training in radiologists. Part 1 of this dissertation proposes an eye-tracking-based metric for measuring the training progress of individual radiologists. Six metrics were identified as potentially useful: time to complete task, fixation count, fixation duration, consciously viewed regions, subconsciously viewed regions, and saccadic length. Part 2 of this dissertation proposes an eye-tracking-based entropy metric for tracking the rise and fall in the interest level of radiologists, as they scan chest radiographs. The results showed that entropy was significantly lower when radiologists were fixating on abnormal regions. Part 3 of this dissertation develops a method that allows extraction of Gabor-based feature vectors from corresponding anatomical regions of "normal" chest radiographs, despite anatomical variations across populations. These feature vectors are then used to develop and compare transductive and inductive computational methods for generating overlay maps that show atypical regions within test radiographs. The results show that the transductive methods produced much better maps than the inductive methods for 20 ground-truthed test radiographs. Part 4 of this dissertation uses an Extended Fuzzy C-Means (EFCM) based instance selection method to reduce the computational cost of transductive methods. The results showed that EFCM substantially reduced the computational cost without a substantial drop in performance. The dissertation then proposes a novel Variance Based Instance Selection (VBIS) method that also reduces the computational cost, but allows for incremental incorporation of new informative radiographs, as they are encountered. Part 5 of this dissertation develops and demonstrates a novel semi-transductive framework that combines the superior performance of transductive methods with the reduced computational cost of inductive methods. The results showed that the semi-transductive approach provided both an effective and efficient framework for detection of atypical regions in chest radiographs. / Dissertation/Thesis / Ph.D. Computer Science 2012 Computer science Medical imaging and radiology Anomaly Detection Atypicality Detection Chest Radiographs Eye tracking for Radiology Training Online Instance Selection Semi-Transductive Learning
4	Beyond Disagreement-based Learning for Contextual Bandits Pinaki Ranjan Mohanty (16522407) 26 July 2023 (has links) <p>While instance-dependent contextual bandits have been previously studied, their analysis<br> has been exclusively limited to pure disagreement-based learning. This approach lacks a<br> nuanced understanding of disagreement and treats it in a binary and absolute manner.<br> In our work, we aim to broaden the analysis of instance-dependent contextual bandits by<br> studying them under the framework of disagreement-based learning in sub-regions. This<br> framework allows for a more comprehensive examination of disagreement by considering its<br> varying degrees across different sub-regions.<br> To lay the foundation for our analysis, we introduce key ideas and measures widely<br> studied in the contextual bandit and disagreement-based active learning literature. We<br> then propose a novel, instance-dependent contextual bandit algorithm for the realizable<br> case in a transductive setting. Leveraging the ability to observe contexts in advance, our<br> algorithm employs a sophisticated Linear Programming subroutine to identify and exploit<br> sub-regions effectively. Next, we provide a series of results tying previously introduced<br> complexity measures and offer some insightful discussion on them. Finally, we enhance the<br> existing regret bounds for contextual bandits by integrating the sub-region disagreement<br> coefficient, thereby showcasing significant improvement in performance against the pure<br> disagreement-based approach.<br> In the concluding section of this thesis, we do a brief recap of the work done and suggest<br> potential future directions for further improving contextual bandit algorithms within the<br> framework of disagreement-based learning in sub-regions. These directions offer opportuni-<br> ties for further research and development, aiming to refine and enhance the effectiveness of<br> contextual bandit algorithms in practical applications.<br> <br> </p> Planning and decision making Statistical theory Contextual bandits Disagreement based learning Active Learning Interactive Learning Data Driven ML Linear Programming Transductive learning
5	[en] POROSITY ESTIMATION FROM SEISMIC ATTRIBUTES WITH SIMULTANEOUS CLASSIFICATION OF SPATIALLY STRUCTURED LATENT FACIES / [pt] PREDIÇÃO DE POROSIDADE A PARTIR DE ATRIBUTOS SÍSMICOS COM CLASSIFICAÇÃO SIMULTÂNEA DE FACIES GEOLÓGICAS LATENTES EM ESTRUTURAS ESPACIAIS LUIZ ALBERTO BARBOSA DE LIMA 26 April 2018 (has links) [pt] Predição de porosidade em reservatórios de óleo e gás representa em uma tarefa crucial e desafiadora na indústria de petróleo. Neste trabalho é proposto um novo modelo não-linear para predição de porosidade que trata fácies sedimentares como variáveis ocultas ou latentes. Esse modelo, denominado Transductive Conditional Random Field Regression (TCRFR), combina com sucesso os conceitos de Markov random fields, ridge regression e aprendizado transdutivo. O modelo utiliza volumes de impedância sísmica como informação de entrada condicionada aos valores de porosidade disponíveis nos poços existentes no reservatório e realiza de forma simultânea e automática a classificação das fácies e a estimativa de porosidade em todo o volume. O método é capaz de inferir as fácies latentes através da combinação de amostras precisas de porosidade local presentes nos poços com dados de impedância sísmica ruidosos, porém disponíveis em todo o volume do reservatório. A informação precisa de porosidade é propagada no volume através de modelos probabilísticos baseados em grafos, utilizando conditional random fields. Adicionalmente, duas novas técnicas são introduzidas como etapas de pré-processamento para aplicação do método TCRFR nos casos extremos em que somente um número bastante reduzido de amostras rotuladas de porosidade encontra-se disponível em um pequeno conjunto de poços exploratórios, uma situação típica para geólogos durante a fase exploratória de uma nova área. São realizados experimentos utilizando dados de um reservatório sintético e de um reservatório real. Os resultados comprovam que o método apresenta um desempenho consideravelmente superior a outros métodos automáticos de predição em relação aos dados sintéticos e, em relação aos dados reais, um desempenho comparável ao gerado por técnicas tradicionais de geo estatística que demandam grande esforço manual por parte de especialistas. / [en] Estimating porosity in oil and gas reservoirs is a crucial and challenging task in the oil industry. A novel nonlinear model for porosity estimation is proposed, which handles sedimentary facies as latent variables. It successfully combines the concepts of conditional random fields (CRFs), transductive learning and ridge regression. The proposed Transductive Conditional Random Field Regression (TCRFR) uses seismic impedance volumes as input information, conditioned on the porosity values from the available wells in the reservoir, and simultaneously and automatically provides as output the porosity estimation and facies classification in the whole volume. The method is able to infer the latent facies states by combining the local, labeled and accurate porosity information available at well locations with the plentiful but imprecise impedance information available everywhere in the reservoir volume. That accurate information is propagated in the reservoir based on conditional random field probabilistic graphical models, greatly reducing uncertainty. In addition, two new techniques are introduced as preprocessing steps for the application of TCRFR in the extreme but realistic cases where just a scarce amount of porosity labeled samples are available in a few exploratory wells, a typical situation for geologists during the evaluation of a reservoir in the exploration phase. Both synthetic and real-world data experiments are presented to prove the usefulness of the proposed methodology, which show that it outperforms previous automatic estimation methods on synthetic data and provides a comparable result to the traditional manual labored geostatistics approach on real-world data. [pt] VARIAVEIS LATENTES [en] LATENT VARIABLES [pt] ESTIMATIVA DE POROSIDADE [en] POROSITY ESTIMATION [pt] CLASSIFICACAO DE FACIES GEOLOGICAS [en] GEOLOGICAL FACIES CLASSIFICATION [pt] CONDITIONAL RANDOM FIELD [en] CONDITIONAL RANDOM FIELD [pt] APRENDIZADO SEMI-SUPERVISIONADO [en] SEMI-SUPERVISED LEARNING [pt] APRENDIZADO TRANSDUTIVO [en] TRANSDUCTIVE LEARNING
6	Model Averaging in Large Scale Learning / Estimateur par agrégat en apprentissage statistique en grande dimension Grappin, Edwin 06 March 2018 (has links) Les travaux de cette thèse explorent les propriétés de procédures d'estimation par agrégation appliquées aux problèmes de régressions en grande dimension. Les estimateurs par agrégation à poids exponentiels bénéficient de résultats théoriques optimaux sous une approche PAC-Bayésienne. Cependant, le comportement théorique de l'agrégat avec extit{prior} de Laplace n'est guère connu. Ce dernier est l'analogue du Lasso dans le cadre pseudo-bayésien. Le Chapitre 2 explicite une borne du risque de prédiction de cet estimateur. Le Chapitre 3 prouve qu'une méthode de simulation s'appuyant sur un processus de Langevin Monte Carlo permet de choisir explicitement le nombre d'itérations nécessaire pour garantir une qualité d'approximation souhaitée. Le Chapitre 4 introduit des variantes du Lasso pour améliorer les performances de prédiction dans des contextes partiellement labélisés. / This thesis explores properties of estimations procedures related to aggregation in the problem of high-dimensional regression in a sparse setting. The exponentially weighted aggregate (EWA) is well studied in the literature. It benefits from strong results in fixed and random designs with a PAC-Bayesian approach. However, little is known about the properties of the EWA with Laplace prior. Chapter 2 analyses the statistical behaviour of the prediction loss of the EWA with Laplace prior in the fixed design setting. Sharp oracle inequalities which generalize the properties of the Lasso to a larger family of estimators are established. These results also bridge the gap from the Lasso to the Bayesian Lasso. Chapter 3 introduces an adjusted Langevin Monte Carlo sampling method that approximates the EWA with Laplace prior in an explicit finite number of iterations for any targeted accuracy. Chapter 4 explores the statisctical behaviour of adjusted versions of the Lasso for the transductive and semi-supervised learning task in the random design setting. Apprentissage statistique Régression Apprentissage automatique Estimation par agrégation PAC-Bayésien Statistical learning Regression Machine learning Estimation by aggregation PAC-Bayesian 519

1

Page generated in 0.0255 seconds