Global ETD Search

1	Kernel Coherence Encoders Sun, Fangzheng 23 April 2018 (has links) In this thesis, we introduce a novel model based on the idea of autoencoders. Different from a classic autoencoder which reconstructs its own inputs through a neural network, our model is closer to Kernel Canonical Correlation Analysis (KCCA) and reconstructs input data from another data set, where these two data sets should have some, perhaps non-linear, dependence. Our model extends traditional KCCA in a way that the non-linearity of the data is learned through optimizing a kernel function by a neural network. In one of the novelties of this thesis, we do not optimize our kernel based upon some prediction error metric, as is classical in autoencoders. Rather, we optimize our kernel to maximize the "coherence" of the underlying low-dimensional hidden layers. This idea makes our method faithful to the classic interpretation of linear Canonical Correlation Analysis (CCA). As far we are aware, our method, which we call a Kernel Coherence Encoder (KCE), is the only extent approach that uses the flexibility of a neural network while maintaining the theoretical properties of classic KCCA. In another one of the novelties of our approach, we leverage a modified version of classic coherence which is far more stable in the presence of high-dimensional data to address computational and robustness issues in the implementation of a coherence based deep learning KCCA. CCA Kernel methods Autoencoders KCCA
2	Support Vector Machines for Speech Recognition Ganapathiraju, Aravind 11 May 2002 (has links) Hidden Markov models (HMM) with Gaussian mixture observation densities are the dominant approach in speech recognition. These systems typically use a representational model for acoustic modeling which can often be prone to overfitting and does not translate to improved discrimination. We propose a new paradigm centered on principles of structural risk minimization using a discriminative framework for speech recognition based on support vector machines (SVMs). SVMs have the ability to simultaneously optimize the representational and discriminative ability of the acoustic classifiers. We have developed the first SVM-based large vocabulary speech recognition system that improves performance over traditional HMM-based systems. This hybrid system achieves a state-of-the-art word error rate of 10.6% on a continuous alphadigit task ? a 10% improvement relative to an HMM system. On SWITCHBOARD, a large vocabulary task, the system improves performance over a traditional HMM system from 41.6% word error rate to 40.6%. This dissertation discusses several practical issues that arise when SVMs are incorporated into the hybrid system. classification kernel methods acoustic modeling
3	Smooth relevance vector machines Schmolck, Alexander January 2008 (has links) Regression tasks belong to the set of core problems faced in statistics and machine learning and promising approaches can often be generalized to also deal with classification, interpolation or denoising problems. Whereas the most widely used classical statistical techniques place severe a priori constraints on the type of function that can be approximated (e.g. only lines, in the case of linear regression), the successes of sparse kernel learners, such as the SVM (support vector machine) demonstrate that good results may be obtained in a quite general framework by enforcing sparsity. Similarly, even very simple sparsity-based denoising techniques, such as classical wavelet shrinkage, can produce surprisingly good results on a wide variety of different signals, because, unlike noise, most signals of practical interest share vital characteristics (such as smoothness, or the ability to be well approximated by piece-wise linear polynomials of a low order) that allow a sparse representation in wavelet space. On the other hand results obtained from SVMs (and classical wavelet-shrinkage) suffer from a certain lack of interpretability, since one cannot straightforwardly attach probabilities to them. By contrast regression, and even more importantly classification, in a Bayesian context always entails a probabilistic measure of confidence in the results, which, provided the model assumptions are reasonably accurate, forms a basis for principled decision-making. The relevance vector machine (RVM) combines these strengths by explicitly encoding the criterion of model sparsity as a (Bayesian) prior over the model weights and offers a single, unified paradigm to efficiently deal with regression as well as classification tasks. However the lack of an explicit prior structure over the weight variances means that the degree of sparsity is to a large extent controlled by the choice of kernel (and kernel parameters). This can lead to severe overfitting or oversmoothing -- possibly even both at the same time (e.g. for the multiscale Doppler data). This thesis details an efficient scheme to control sparsity in Bayesian regression by incorporating a flexible noise-dependent smoothness prior into the RVM. The resultant smooth RVM (sRVM) encompasses the original RVM as a special case, but empirical results with a variety of popular data sets show that it can surpass RVM performance in terms of goodness of fit and achieved sparsity as well as computational performance in many cases. As the smoothness prior effectively makes it possible to use (highly efficient) wavelet kernels in an RVM setting this work also unveils a strong connection between Bayesian wavelet shrinkage and RVM regression and effectively further extends the applicability of the RVM to denoising tasks for up to millions of datapoints. We further discuss its applicability to classification tasks. 006.4
4	Graphical Models: Modeling, Optimization, and Hilbert Space Embedding Zhang, Xinhua, xinhua.zhang.cs@gmail.com January 2010 (has links) Over the past two decades graphical models have been widely used as powerful tools for compactly representing distributions. On the other hand, kernel methods have been used extensively to come up with rich representations. This thesis aims to combine graphical models with kernels to produce compact models with rich representational abilities. Graphical models are a powerful underlying formalism in machine learning. Their graph theoretic properties provide both an intuitive modular interface to model the interacting factors, and a data structure facilitating efficient learning and inference. The probabilistic nature ensures the global consistency of the whole framework, and allows convenient interface of models to data. Kernel methods, on the other hand, provide an effective means of representing rich classes of features for general objects, and at the same time allow efficient search for the optimal model. Recently, kernels have been used to characterize distributions by embedding them into high dimensional feature space. Interestingly, graphical models again decompose this characterization and lead to novel and direct ways of comparing distributions based on samples. Among the many uses of graphical models and kernels, this thesis is devoted to the following four areas: Conditional random fields for multi-agent reinforcement learning Conditional random fields (CRFs) are graphical models for modelling the probability of labels given the observations. They have traditionally been trained with using a set of observation and label pairs. Underlying all CRFs is the assumption that, conditioned on the training data, the label sequences of different training examples are independent and identically distributed (iid ). We extended the use of CRFs to a class of temporal learning algorithms, namely policy gradient reinforcement learning (RL). Now the labels are no longer iid. They are actions that update the environment and affect the next observation. From an RL point of view, CRFs provide a natural way to model joint actions in a decentralized Markov decision process. They define how agents can communicate with each other to choose the optimal joint action. We tested our framework on a synthetic network alignment problem, a distributed sensor network, and a road traffic control system. Using tree sampling by Hamze & de Freitas (2004) for inference, the RL methods employing CRFs clearly outperform those which do not model the proper joint policy. Bayesian online multi-label classification Gaussian density filtering (GDF) provides fast and effective inference for graphical models (Maybeck, 1982). Based on this natural online learner, we propose a Bayesian online multi-label classification (BOMC) framework which learns a probabilistic model of the linear classifier. The training labels are incorporated to update the posterior of the classifiers via a graphical model similar to TrueSkill (Herbrich et al., 2007), and inference is based on GDF with expectation propagation. Using samples from the posterior, we label the test data by maximizing the expected F-score. Our experiments on Reuters1-v2 dataset show that BOMC delivers significantly higher macro-averaged F-score than the state-of-the-art online maximum margin learners such as LaSVM (Bordes et al., 2005) and passive aggressive online learning (Crammer et al., 2006). The online nature of BOMC also allows us to effciently use a large amount of training data. Hilbert space embedment of distributions Graphical models are also an essential tool in kernel measures of independence for non-iid data. Traditional information theory often requires density estimation, which makes it unideal for statistical estimation. Motivated by the fact that distributions often appear in machine learning via expectations, we can characterize the distance between distributions in terms of distances between means, especially means in reproducing kernel Hilbert spaces which are called kernel embedment. Under this framework, the undirected graphical models further allow us to factorize the kernel embedment onto cliques, which yields efficient measures of independence for non-iid data (Zhang et al., 2009). We show the effectiveness of this framework for ICA and sequence segmentation, and a number of further applications and research questions are identified. Optimization in maximum margin models for structured data Maximum margin estimation for structured data, e.g. (Taskar et al., 2004), is an important task in machine learning where graphical models also play a key role. They are special cases of regularized risk minimization, for which bundle methods (BMRM, Teo et al., 2007) and the closely related SVMStruct (Tsochantaridis et al., 2005) are state-of-the-art general purpose solvers. Smola et al. (2007b) proved that BMRM requires O(1/έ) iterations to converge to an έ accurate solution, and we further show that this rate hits the lower bound. By utilizing the structure of the objective function, we devised an algorithm for the structured loss which converges to an έ accurate solution in O(1/√έ) iterations. This algorithm originates from Nesterov's optimal first order methods (Nesterov, 2003, 2005b). Machine Learning Graphical Models Kernel Methods Optimization
5	Data-Driven, Non-Parametric Model Reference Adaptive Control Methods for Autonomous Underwater Vehicles Oesterheld, Derek I. 03 November 2023 (has links) This thesis details the implementation of two adaptive controllers on autonomous underwater vehicle(AUV) attitude dynamics starting from the standard six degree-of-freedom dynamic model. I apply two model reference adaptive control (MRAC) algorithms which make use of kernel functions for learning functional uncertainty present in the system dynamics. The first method extends recent results on model reference adaptive control using reproducing kernel Hilbert space (RKHS) learning techniques for some general cases of multi-input systems. The first controller design is a model reference adaptive controller (MRAC) based on a vector- valued RKHS that is induced by operator-valued kernels. This paper formulates a model reference adaptive control strategy based on a dead zone robust modification, and derives conditions for the ultimate boundedness of the tracking error in this case. The second controller is an implementation of the Gaussian Process MRAC developed by Chowdhary, et al. I discuss the method of each of these algorithms before contrasting the underlying theoretical structure of each algorithm. Finally, I provide a comparison of each algorithm's performance on the six degree-of-freedom dynamic model of the Virginia Tech 690 AUV and provide field trial results for the RKHS based MRAC implementation. / Master of Science / This thesis details the implementation of two algorithms which control the attitude of an autonomous underwater vehicle. Rather than developing detailed dynamic models of the vehicles as is performed in classical control methods, each of these implementations only makes assumptions that the unknown portions of the dynamic models can be represented by a broad class of functions defined by a mathematical structure called a reproducing kernel Hilbert Space. Each algorithm implements learning techniques using the theory of reproducing kernel Hilbert spaces to bound the error between the vehicle attitude and the commanded vehicle attitude. One algorithm, called RKHS MRAC, implements an adaptive update law based on the attitude error to improve the controller performance. The second algorithm, called GP MRAC, uses estimated vehicle rotational accelerations and statistical learning methods to approximate the unknown function. Each of these methods is compared in theory and in a vehicle simulation. The RKHS MRAC is additionally demonstrated in field trial results. Adaptive Control Kernel methods Autonomous Underwater Vehicles
6	Kernels for protein homology detection Spalding, John Dylan January 2009 (has links) Determining protein sequence similarity is an important task for protein classification and homology detection, which is typically performed using sequence alignment algorithms. Fast and accurate alignment-free kernel based classifiers exist, that treat protein sequences as a “bag of words”. Kernels implicitly map the sequences to a high dimensional feature space, and can be thought of as an inner product between two vectors in that space. This allows an algorithm that can be expressed purely in terms of inner products to be ‘kernelised’, where the algorithm implicitly operates in the kernel’s feature space. A weighted string kernel, where the weighting is derived using probabilistic methods, is implemented using a binary data representation, and the results reported. Alternative forms of data representation, such as Ising and frequency forms, are implemented and the results discussed. These results are then used to inform the development of a variety of novel kernels for protein sequence comparison. Alternative forms of classifier are investigated, such as nearest neighbour, support vector machines, and multiple kernel learning. A kernelized Gaussian classifier is derived and tested, which is informative as it returns a score related to the probability of a sequence belonging to a particular classification. Support vector machines are tested with the introduced kernels, and the results compared to alternate classifiers. As similarity can be thought of as having different components, such as composition and position, multiple kernel learning is investigated with the novel kernels developed here. The results show that a support vector machine, using either single or multiple kernels, is the best classifier for remote protein homology detection out of all the classifiers tested in this thesis. 572.6
7	Inférence de réseaux d'interaction protéine-protéine par apprentissage statistique / Protein-protein interaction network inference using statistical learning Brouard, Céline 14 February 2013 (has links) L'objectif de cette thèse est de développer des outils de prédiction d'interactions entre protéines qui puissent être appliqués en particulier sur le réseau d’interaction autour de la protéine CFTR, qui est impliquée dans la mucoviscidose. Le développement de méthodes de prédiction in silico peut s'avérer utile pour suggérer aux biologistes de nouvelles cibles d'interaction. Nous proposons une nouvelle méthode pour la prédiction de liens dans un réseau. Afin de bénéficier de l'information des données non étiquetées, nous nous plaçons dans le cadre de l'apprentissage semi-supervisé. Nous abordons ce problème de prédiction comme une tâche d'apprentissage d'un noyau de sortie. Un noyau de sortie est supposé coder les proximités existantes entres les nœuds du graphe et l'objectif est d'approcher ce noyau à partir de descriptions appropriées en entrée. L'utilisation de l'astuce du noyau dans l'ensemble de sortie permet de réduire le problème d'apprentissage à celui d'une fonction d'une seule variable à valeurs dans un espace de Hilbert. En choisissant les fonctions candidates pour la régression dans un espace de Hilbert à noyau reproduisant à valeur opérateur, nous développons, comme dans le cas de fonctions à valeurs scalaires, des outils de régularisation. Nous établissons en particulier des théorèmes de représentation, qui permettent de définir de nouveaux modèles de régression. Nous avons testé l'approche développée sur des données artificielles, des problèmes test ainsi que sur un réseau d'interaction chez la levure et obtenu de très bons résultats. Puis nous l'avons appliquée à la prédiction d'interactions entre protéines dans le cas d'un réseau construit autour de CFTR. / The aim of this thesis is to develop tools for predicting interactions between proteins that can be applied to the human proteins forming a network with the CFTR protein. This protein, when defective, is involved in cystic fibrosis. The development of in silico prediction methods can be useful for biologists to suggest new interaction targets. We propose a new method to solve the link prediction problem. To benefit from the information of unlabeled data, we place ourselves in the semi-supervised learning framework. Link prediction is addressed as an output kernel learning task, referred as Output Kernel Regression. An output kernel is assumed to encode the proximities of nodes in the target graph and the goal is to approximate this kernel by using appropriate input features. Using the kernel trick in the output space allows one to reduce the problem of learning from pairs to learning a single variable function with output values in a Hilbert space. By choosing candidates for regression functions in a reproducing kernel Hilbert space with operator valued kernels, we develop tools for regularization as for scalar-valued functions. We establish representer theorems in the supervised and semi-supervised cases and use them to define new regression models for different cost functions. We first tested the developed approach on transductive link prediction using artificial data, benchmark data as well as a protein-protein interaction network of the yeast and we obtained very good results. Then we applied it to the prediction of protein interactions in a network built around the CFTR protein. Prédiction de liens Link prediction Kernel methods Protein-protein interaction
8	Learning to rank by maximizing the AUC with linear programming for problems with binary output Ataman, Kaan 01 January 2007 (has links) Ranking is a popular machine learning problem that has been studied extensively for more then a decade. Typical machine learning algorithms are generally built to optimize predictive performance (usually measured in accuracy) by minimizing classification error. However, there are many real world problems where correct ordering of instances is of equal or greater importance than correct classification. Learning algorithms that are built to minimize classification error are often not effective when ordering within or among classes. This gap in research created a necessity to alter the objective of such algorithms to focus on correct ranking rather then classification. Area Under the ROC Curve (AUC), which is equivalent to the Wicoxon-Mann-Whitney (WMW) statistic, is a widely accepted performance measure for evaluating ranking performance in binary classification problems. In this work we present a linear programming approach (LPR), similar to 1-norm Support Vector Machines (SVM), for ranking instances with binary outputs by maximizing an approximation to the WMW statistic. Our formulation handles non-linear problems by making use of kernel functions. The results on several well-known benchmark datasets show that our approach ranks better than 2-norm SVM and faster than the support vector ranker (SVR). The number of constraints in the linear programming formulation increases quadratically with the number of data points considered for the training of the algorithm. We tackle this problem by implementing a number of exact and approximate speed-up approaches inspired by well-known methods such as chunking, clustering and subgradient methods. The subgradient method is the most promising because of its solution quality and its fast convergence to the optimal solution. We adopted LPR formulation to survival analysis. With this approach it is possible to order subjects by risk for experiencing an event. Such an ordering enables determination of high-risk and low-risk groups among the subjects that can be helpful not only in medical studies but also in engineering, business and social sciences. Our results show that our algorithm is superior in time-to-event prediction to the most popular survival analysis tool, Cox's proportional hazard regression. Ranking
9	The Power Landmark Vector Learning Framework Xiang, Shuo 07 May 2008 (has links) Kernel methods have recently become popular in bioinformatics machine learning. Kernel methods allow linear algorithms to be applied to non-linear learning situations. By using kernels, non-linear learning problems can benefit from the statistical and runtime stability traditionally enjoyed by linear learning problems. However, traditional kernel learning frameworks use implicit feature spaces whose mathematical properties were hard to characterize. In order to address this problem, recent research has proposed a vector learning framework that uses landmark vectors which are unlabeled vectors belonging to the same distribution and the same input space as the training vectors. This thesis introduces an extension to the landmark vector learning framework that allows it to utilize two new classes of landmark vectors in the input space. This augmented learning framework is named the power landmark vector learning framework. A theoretical description of the power landmark vector learning framework is given along with proofs of new theoretical results. Experimental results show that the performance of the power landmark vector learning framework is comparable to traditional kernel learning frameworks. Bioinformatics Machine Learning Kernel Methods Landmark Vector Learning Computer Science
10	The Power Landmark Vector Learning Framework Xiang, Shuo 07 May 2008 (has links) Kernel methods have recently become popular in bioinformatics machine learning. Kernel methods allow linear algorithms to be applied to non-linear learning situations. By using kernels, non-linear learning problems can benefit from the statistical and runtime stability traditionally enjoyed by linear learning problems. However, traditional kernel learning frameworks use implicit feature spaces whose mathematical properties were hard to characterize. In order to address this problem, recent research has proposed a vector learning framework that uses landmark vectors which are unlabeled vectors belonging to the same distribution and the same input space as the training vectors. This thesis introduces an extension to the landmark vector learning framework that allows it to utilize two new classes of landmark vectors in the input space. This augmented learning framework is named the power landmark vector learning framework. A theoretical description of the power landmark vector learning framework is given along with proofs of new theoretical results. Experimental results show that the performance of the power landmark vector learning framework is comparable to traditional kernel learning frameworks. Bioinformatics Machine Learning Kernel Methods Landmark Vector Learning Computer Science

Search results