Global ETD Search

1	On Web Taxonomy Integration Zhang, Dell, Lee, Wee Sun 01 1900 (has links) We address the problem of integrating objects from a source taxonomy into a master taxonomy. This problem is not only pervasive on the nowadays web, but also important to the emerging semantic web. A straightforward approach to automating this process would be to train a classifier for each category in the master taxonomy, and then classify objects from the source taxonomy into these categories. In this paper we attempt to use a powerful classification method, Support Vector Machine (SVM), to attack this problem. Our key insight is that the availability of the source taxonomy data could be helpful to build better classifiers in this scenario, therefore it would be beneficial to do transductive learning rather than inductive learning, i.e., learning to optimize classification performance on a particular set of test examples. Noticing that the categorization of the master and source taxonomies often have some semantic overlap, we propose a new method, Cluster Shrinkage (CS), to further enhance the classification by exploiting such implicit knowledge. Our experiments with real-world web data show substantial improvements in the performance of taxonomy integration. / Singapore-MIT Alliance (SMA) web taxonomy integration classification support vector machines transductive learning
2	Evolving connectionist systems for adaptive decision support with application in ecological data modelling Soltic, Snjezana January 2009 (has links) Ecological modelling problems have characteristics both featured in other modelling fields and specific ones, hence, methods developed and tested in other research areas may not be suitable for modelling ecological problems or may perform poorly when used on ecological data. This thesis identifies issues associated with the techniques typically used for solving ecological problems and develops new generic methods for decision support, especially suitable for ecological data modelling, which are characterised by: (1) adaptive learning, (2) knowledge discovery and (3) accurate prediction. These new methods have been successfully applied to challenging real world ecological problems. Despite the fact that the number of possible applications of computational intelligence methods in ecology is vast, this thesis primarily concentrates on two problems: (1) species establishment prediction and (2) environmental monitoring. Our review of recent papers suggests that multi-layer perceptron networks trained using the backpropagation algorithm are most widely used of all artificial neural networks for forecasting pest insect invasions. While the multi-layer perceptron networks are appropriate for modelling complex nonlinear relationships, they have rather limited exploratory capabilities and are difficult to adapt to dynamically changing data. In this thesis an approach that addresses these limitations is proposed. We found that environmental monitoring applications could benefit from having an intelligent taste recognition system possibly embedded in an autonomous robot. Hence, this thesis reviews the current knowledge on taste recognition and proposes a biologically inspired artificial model of taste recognition based on biologically plausible spiking neurons. The model is dynamic and is capable of learning new tastants as they become available. Furthermore, the model builds a knowledge base that can be extracted during or after the learning process in form of IF-THEN fuzzy rules. It also comprises a layer that simulates the influence of taste receptor cells on the activity of their adjacent cells. These features increase the biological relevance of the model compared to other current taste recognition models. The proposed model was implemented in software on a single personal computer and in hardware on an Altera FPGA chip. Both implementations were applied to two real-world taste datasets.In addition, for the first time the applicability of transductive reasoning for forecasting the establishment potential of pest insects into new locations was investigated. For this purpose four types of predictive models, built using inductive and transductive reasoning, were used for predicting the distributions of three pest insects. The models were evaluated in terms of their predictive accuracy and their ability to discover patterns in the modelling data. The results obtained indicate that evolving connectionist systems can be successfully used for building predictive distribution models and environmental monitoring systems. The features available in the proposed dynamic systems, such as on-line learning and knowledge discovery, are needed to improve our knowledge of the species distributions. This work laid down the foundation for a number of interesting future projects in the field of ecological modelling, robotics, pervasive computing and pattern recognition that can be undertaken separately or in sequence. Evolving connectionist systems Local modelling Transductive reasoning Spiking neural networks Taste recognition systems FPGA
3	Evolving connectionist systems for adaptive decision support with application in ecological data modelling Soltic, Snjezana January 2009 (has links) Ecological modelling problems have characteristics both featured in other modelling fields and specific ones, hence, methods developed and tested in other research areas may not be suitable for modelling ecological problems or may perform poorly when used on ecological data. This thesis identifies issues associated with the techniques typically used for solving ecological problems and develops new generic methods for decision support, especially suitable for ecological data modelling, which are characterised by: (1) adaptive learning, (2) knowledge discovery and (3) accurate prediction. These new methods have been successfully applied to challenging real world ecological problems. Despite the fact that the number of possible applications of computational intelligence methods in ecology is vast, this thesis primarily concentrates on two problems: (1) species establishment prediction and (2) environmental monitoring. Our review of recent papers suggests that multi-layer perceptron networks trained using the backpropagation algorithm are most widely used of all artificial neural networks for forecasting pest insect invasions. While the multi-layer perceptron networks are appropriate for modelling complex nonlinear relationships, they have rather limited exploratory capabilities and are difficult to adapt to dynamically changing data. In this thesis an approach that addresses these limitations is proposed. We found that environmental monitoring applications could benefit from having an intelligent taste recognition system possibly embedded in an autonomous robot. Hence, this thesis reviews the current knowledge on taste recognition and proposes a biologically inspired artificial model of taste recognition based on biologically plausible spiking neurons. The model is dynamic and is capable of learning new tastants as they become available. Furthermore, the model builds a knowledge base that can be extracted during or after the learning process in form of IF-THEN fuzzy rules. It also comprises a layer that simulates the influence of taste receptor cells on the activity of their adjacent cells. These features increase the biological relevance of the model compared to other current taste recognition models. The proposed model was implemented in software on a single personal computer and in hardware on an Altera FPGA chip. Both implementations were applied to two real-world taste datasets.In addition, for the first time the applicability of transductive reasoning for forecasting the establishment potential of pest insects into new locations was investigated. For this purpose four types of predictive models, built using inductive and transductive reasoning, were used for predicting the distributions of three pest insects. The models were evaluated in terms of their predictive accuracy and their ability to discover patterns in the modelling data. The results obtained indicate that evolving connectionist systems can be successfully used for building predictive distribution models and environmental monitoring systems. The features available in the proposed dynamic systems, such as on-line learning and knowledge discovery, are needed to improve our knowledge of the species distributions. This work laid down the foundation for a number of interesting future projects in the field of ecological modelling, robotics, pervasive computing and pattern recognition that can be undertaken separately or in sequence. Evolving connectionist systems Local modelling Transductive reasoning Spiking neural networks Taste recognition systems FPGA
4	Semi-supervised and transductive learning algorithms for predicting alternative splicing events in genes. Tangirala, Karthik January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / As genomes are sequenced, a major challenge is their annotation -- the identification of genes and regulatory elements, their locations and their functions. For years, it was believed that one gene corresponds to one protein, but the discovery of alternative splicing provided a mechanism for generating different gene transcripts (isoforms) from the same genomic sequence. In the recent years, it has become obvious that a large fraction of genes undergoes alternative splicing. Thus, understanding alternative splicing is a problem of great interest to biologists. Supervised machine learning approaches can be used to predict alternative splicing events at genome level. However, supervised approaches require large amounts of labeled data to produce accurate classifiers. While large amounts of genomic data are produced by the new sequencing technologies, labeling these data can be costly and time consuming. Therefore, semi-supervised learning approaches that can make use of large amounts of unlabeled data, in addition to small amounts of labeled data are highly desirable. In this work, we study the usefulness of a semi-supervised learning approach, co-training, for classifying exons as alternatively spliced or constitutive. The co-training algorithm makes use of two views of the data to iteratively learn two classifiers that can inform each other, at each step, with their best predictions on the unlabeled data. We consider three sets of features for constructing views for the problem of predicting alternatively spliced exons: lengths of the exon of interest and its flanking introns, exonic splicing enhancers (a.k.a., ESE motifs) and intronic regulatory sequences (a.k.a., IRS motifs). Naive Bayes and Support Vector Machine (SVM) algorithms are used as based classifiers in our study. Experimental results show that the usage of the unlabeled data can result in better classifiers as compared to those obtained from the small amount of labeled data alone. In addition to semi-supervised approaches, we also also study the usefulness of graph based transductive learning approaches for predicting alternatively spliced exons. Similar to the semi-supervised learning algorithms, transductive learning algorithms can make use of unlabeled data, together with labeled data, to produce labels for the unlabeled data. However, a classification model that could be used to classify new unlabeled data is not learned in this case. Experimental results show that graph based transductive approaches can make effective use of the unlabeled data. Alternative splicing Co training Semi supervised learning Transductive learning Graph based approach Bioinformatics (0715) Computer Science (0984)
5	Support vector machines, generalization bounds, and transduction Kroon, Rodney Stephen 12 1900 (has links) Thesis (MComm)--University of Stellenbosch, 2003. / Please refer to full text for abstract. Machine learning Computer algorithms PAC bounds Support vector machine (SVM) Transductive bounds Model selection Theses -- Computer science Dissertations -- Computer science Theses -- Mathematics Dissertations -- Mathematics
6	Computational Methods for Perceptual Training in Radiology January 2012 (has links) abstract: Medical images constitute a special class of images that are captured to allow diagnosis of disease, and their "correct" interpretation is vitally important. Because they are not "natural" images, radiologists must be trained to visually interpret them. This training process includes implicit perceptual learning that is gradually acquired over an extended period of exposure to medical images. This dissertation proposes novel computational methods for evaluating and facilitating perceptual training in radiologists. Part 1 of this dissertation proposes an eye-tracking-based metric for measuring the training progress of individual radiologists. Six metrics were identified as potentially useful: time to complete task, fixation count, fixation duration, consciously viewed regions, subconsciously viewed regions, and saccadic length. Part 2 of this dissertation proposes an eye-tracking-based entropy metric for tracking the rise and fall in the interest level of radiologists, as they scan chest radiographs. The results showed that entropy was significantly lower when radiologists were fixating on abnormal regions. Part 3 of this dissertation develops a method that allows extraction of Gabor-based feature vectors from corresponding anatomical regions of "normal" chest radiographs, despite anatomical variations across populations. These feature vectors are then used to develop and compare transductive and inductive computational methods for generating overlay maps that show atypical regions within test radiographs. The results show that the transductive methods produced much better maps than the inductive methods for 20 ground-truthed test radiographs. Part 4 of this dissertation uses an Extended Fuzzy C-Means (EFCM) based instance selection method to reduce the computational cost of transductive methods. The results showed that EFCM substantially reduced the computational cost without a substantial drop in performance. The dissertation then proposes a novel Variance Based Instance Selection (VBIS) method that also reduces the computational cost, but allows for incremental incorporation of new informative radiographs, as they are encountered. Part 5 of this dissertation develops and demonstrates a novel semi-transductive framework that combines the superior performance of transductive methods with the reduced computational cost of inductive methods. The results showed that the semi-transductive approach provided both an effective and efficient framework for detection of atypical regions in chest radiographs. / Dissertation/Thesis / Ph.D. Computer Science 2012 Computer science Medical imaging and radiology Anomaly Detection Atypicality Detection Chest Radiographs Eye tracking for Radiology Training Online Instance Selection Semi-Transductive Learning
7	Beyond Disagreement-based Learning for Contextual Bandits Pinaki Ranjan Mohanty (16522407) 26 July 2023 (has links) <p>While instance-dependent contextual bandits have been previously studied, their analysis<br> has been exclusively limited to pure disagreement-based learning. This approach lacks a<br> nuanced understanding of disagreement and treats it in a binary and absolute manner.<br> In our work, we aim to broaden the analysis of instance-dependent contextual bandits by<br> studying them under the framework of disagreement-based learning in sub-regions. This<br> framework allows for a more comprehensive examination of disagreement by considering its<br> varying degrees across different sub-regions.<br> To lay the foundation for our analysis, we introduce key ideas and measures widely<br> studied in the contextual bandit and disagreement-based active learning literature. We<br> then propose a novel, instance-dependent contextual bandit algorithm for the realizable<br> case in a transductive setting. Leveraging the ability to observe contexts in advance, our<br> algorithm employs a sophisticated Linear Programming subroutine to identify and exploit<br> sub-regions effectively. Next, we provide a series of results tying previously introduced<br> complexity measures and offer some insightful discussion on them. Finally, we enhance the<br> existing regret bounds for contextual bandits by integrating the sub-region disagreement<br> coefficient, thereby showcasing significant improvement in performance against the pure<br> disagreement-based approach.<br> In the concluding section of this thesis, we do a brief recap of the work done and suggest<br> potential future directions for further improving contextual bandit algorithms within the<br> framework of disagreement-based learning in sub-regions. These directions offer opportuni-<br> ties for further research and development, aiming to refine and enhance the effectiveness of<br> contextual bandit algorithms in practical applications.<br> <br> </p> Planning and decision making Statistical theory Contextual bandits Disagreement based learning Active Learning Interactive Learning Data Driven ML Linear Programming Transductive learning
8	[en] POROSITY ESTIMATION FROM SEISMIC ATTRIBUTES WITH SIMULTANEOUS CLASSIFICATION OF SPATIALLY STRUCTURED LATENT FACIES / [pt] PREDIÇÃO DE POROSIDADE A PARTIR DE ATRIBUTOS SÍSMICOS COM CLASSIFICAÇÃO SIMULTÂNEA DE FACIES GEOLÓGICAS LATENTES EM ESTRUTURAS ESPACIAIS LUIZ ALBERTO BARBOSA DE LIMA 26 April 2018 (has links) [pt] Predição de porosidade em reservatórios de óleo e gás representa em uma tarefa crucial e desafiadora na indústria de petróleo. Neste trabalho é proposto um novo modelo não-linear para predição de porosidade que trata fácies sedimentares como variáveis ocultas ou latentes. Esse modelo, denominado Transductive Conditional Random Field Regression (TCRFR), combina com sucesso os conceitos de Markov random fields, ridge regression e aprendizado transdutivo. O modelo utiliza volumes de impedância sísmica como informação de entrada condicionada aos valores de porosidade disponíveis nos poços existentes no reservatório e realiza de forma simultânea e automática a classificação das fácies e a estimativa de porosidade em todo o volume. O método é capaz de inferir as fácies latentes através da combinação de amostras precisas de porosidade local presentes nos poços com dados de impedância sísmica ruidosos, porém disponíveis em todo o volume do reservatório. A informação precisa de porosidade é propagada no volume através de modelos probabilísticos baseados em grafos, utilizando conditional random fields. Adicionalmente, duas novas técnicas são introduzidas como etapas de pré-processamento para aplicação do método TCRFR nos casos extremos em que somente um número bastante reduzido de amostras rotuladas de porosidade encontra-se disponível em um pequeno conjunto de poços exploratórios, uma situação típica para geólogos durante a fase exploratória de uma nova área. São realizados experimentos utilizando dados de um reservatório sintético e de um reservatório real. Os resultados comprovam que o método apresenta um desempenho consideravelmente superior a outros métodos automáticos de predição em relação aos dados sintéticos e, em relação aos dados reais, um desempenho comparável ao gerado por técnicas tradicionais de geo estatística que demandam grande esforço manual por parte de especialistas. / [en] Estimating porosity in oil and gas reservoirs is a crucial and challenging task in the oil industry. A novel nonlinear model for porosity estimation is proposed, which handles sedimentary facies as latent variables. It successfully combines the concepts of conditional random fields (CRFs), transductive learning and ridge regression. The proposed Transductive Conditional Random Field Regression (TCRFR) uses seismic impedance volumes as input information, conditioned on the porosity values from the available wells in the reservoir, and simultaneously and automatically provides as output the porosity estimation and facies classification in the whole volume. The method is able to infer the latent facies states by combining the local, labeled and accurate porosity information available at well locations with the plentiful but imprecise impedance information available everywhere in the reservoir volume. That accurate information is propagated in the reservoir based on conditional random field probabilistic graphical models, greatly reducing uncertainty. In addition, two new techniques are introduced as preprocessing steps for the application of TCRFR in the extreme but realistic cases where just a scarce amount of porosity labeled samples are available in a few exploratory wells, a typical situation for geologists during the evaluation of a reservoir in the exploration phase. Both synthetic and real-world data experiments are presented to prove the usefulness of the proposed methodology, which show that it outperforms previous automatic estimation methods on synthetic data and provides a comparable result to the traditional manual labored geostatistics approach on real-world data. [pt] VARIAVEIS LATENTES [en] LATENT VARIABLES [pt] ESTIMATIVA DE POROSIDADE [en] POROSITY ESTIMATION [pt] CLASSIFICACAO DE FACIES GEOLOGICAS [en] GEOLOGICAL FACIES CLASSIFICATION [pt] CONDITIONAL RANDOM FIELD [en] CONDITIONAL RANDOM FIELD [pt] APRENDIZADO SEMI-SUPERVISIONADO [en] SEMI-SUPERVISED LEARNING [pt] APRENDIZADO TRANSDUTIVO [en] TRANSDUCTIVE LEARNING
9	Model Averaging in Large Scale Learning / Estimateur par agrégat en apprentissage statistique en grande dimension Grappin, Edwin 06 March 2018 (has links) Les travaux de cette thèse explorent les propriétés de procédures d'estimation par agrégation appliquées aux problèmes de régressions en grande dimension. Les estimateurs par agrégation à poids exponentiels bénéficient de résultats théoriques optimaux sous une approche PAC-Bayésienne. Cependant, le comportement théorique de l'agrégat avec extit{prior} de Laplace n'est guère connu. Ce dernier est l'analogue du Lasso dans le cadre pseudo-bayésien. Le Chapitre 2 explicite une borne du risque de prédiction de cet estimateur. Le Chapitre 3 prouve qu'une méthode de simulation s'appuyant sur un processus de Langevin Monte Carlo permet de choisir explicitement le nombre d'itérations nécessaire pour garantir une qualité d'approximation souhaitée. Le Chapitre 4 introduit des variantes du Lasso pour améliorer les performances de prédiction dans des contextes partiellement labélisés. / This thesis explores properties of estimations procedures related to aggregation in the problem of high-dimensional regression in a sparse setting. The exponentially weighted aggregate (EWA) is well studied in the literature. It benefits from strong results in fixed and random designs with a PAC-Bayesian approach. However, little is known about the properties of the EWA with Laplace prior. Chapter 2 analyses the statistical behaviour of the prediction loss of the EWA with Laplace prior in the fixed design setting. Sharp oracle inequalities which generalize the properties of the Lasso to a larger family of estimators are established. These results also bridge the gap from the Lasso to the Bayesian Lasso. Chapter 3 introduces an adjusted Langevin Monte Carlo sampling method that approximates the EWA with Laplace prior in an explicit finite number of iterations for any targeted accuracy. Chapter 4 explores the statisctical behaviour of adjusted versions of the Lasso for the transductive and semi-supervised learning task in the random design setting. Apprentissage statistique Régression Apprentissage automatique Estimation par agrégation PAC-Bayésien Statistical learning Regression Machine learning Estimation by aggregation PAC-Bayesian 519
10	Methods for face detection and adaptive face recognition Pavani, Sri-Kaushik 21 July 2010 (has links) The focus of this thesis is on facial biometrics; specifically in the problems of face detection and face recognition. Despite intensive research over the last 20 years, the technology is not foolproof, which is why we do not see use of face recognition systems in critical sectors such as banking. In this thesis, we focus on three sub-problems in these two areas of research. Firstly, we propose methods to improve the speed-accuracy trade-off of the state-of-the-art face detector. Secondly, we consider a problem that is often ignored in the literature: to decrease the training time of the detectors. We propose two techniques to this end. Thirdly, we present a detailed large-scale study on self-updating face recognition systems in an attempt to answer if continuously changing facial appearance can be learnt automatically. / L'objectiu d'aquesta tesi és sobre biometria facial, específicament en els problemes de detecció de rostres i reconeixement facial. Malgrat la intensa recerca durant els últims 20 anys, la tecnologia no és infalible, de manera que no veiem l'ús dels sistemes de reconeixement de rostres en sectors crítics com la banca. En aquesta tesi, ens centrem en tres sub-problemes en aquestes dues àrees de recerca. En primer lloc, es proposa mètodes per millorar l'equilibri entre la precisió i la velocitat del detector de cares d'última generació. En segon lloc, considerem un problema que sovint s'ignora en la literatura: disminuir el temps de formació dels detectors. Es proposen dues tècniques per a aquest fi. En tercer lloc, es presenta un estudi detallat a gran escala sobre l'auto-actualització dels sistemes de reconeixement facial en un intent de respondre si el canvi constant de l'aparença facial es pot aprendre de forma automàtica. Fisher's LDA Gaussian weak classifiers Weak classifier Variance normalization Transductive reasoning training complexity rejection cascade fusion of p-values OSTCM-kNN classifier integrated Performance Primitives impostor detection haar-like features fusion of shape and texture face recognition system delaunay triangulation MIT+CMU face database YT database GEFA database temporal confidence segmentation confidence face detector confidence classification confidence laplacian clutter model clutter models changes in facial appearance face segmentation face normalization face detection automatic face recognition system 62

Search results