Global ETD Search

1	Mining Data with Feature Interactions January 2018 (has links) abstract: Models using feature interactions have been applied successfully in many areas such as biomedical analysis, recommender systems. The popularity of using feature interactions mainly lies in (1) they are able to capture the nonlinearity of the data compared with linear effects and (2) they enjoy great interpretability. In this thesis, I propose a series of formulations using feature interactions for real world problems and develop efficient algorithms for solving them. Specifically, I first propose to directly solve the non-convex formulation of the weak hierarchical Lasso which imposes weak hierarchy on individual features and interactions but can only be approximately solved by a convex relaxation in existing studies. I further propose to use the non-convex weak hierarchical Lasso formulation for hypothesis testing on the interaction features with hierarchical assumptions. Secondly, I propose a type of bi-linear models that take advantage of interactions of features for drug discovery problems where specific drug-drug pairs or drug-disease pairs are of interest. These models are learned by maximizing the number of positive data pairs that rank above the average score of unlabeled data pairs. Then I generalize the method to the case of using the top-ranked unlabeled data pairs for representative construction and derive an efficient algorithm for the extended formulation. Last but not least, motivated by a special form of bi-linear models, I propose a framework that enables simultaneously subgrouping data points and building specific models on the subgroups for learning on massive and heterogeneous datasets. Experiments on synthetic and real datasets are conducted to demonstrate the effectiveness or efficiency of the proposed methods. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2018 Computer science bipartite ranking feature interaction hierarchical structure
2	Machine Learning and Rank Aggregation Methods for Gene Prioritization from Heterogeneous Data Sources Laha, Anirban January 2013 (has links) (PDF) Gene prioritization involves ranking genes by possible relevance to a disease of interest. This is important in order to narrow down the set of genes to be investigated biologically, and over the years, several computational approaches have been proposed for automat-ically prioritizing genes using some form of gene-related data, mostly using statistical or machine learning methods. Recently, Agarwal and Sengupta (2009) proposed the use of learning-to-rank methods, which have been used extensively in information retrieval and related fields, to learn a ranking of genes from a given data source, and used this approach to successfully identify novel genes related to leukemia and colon cancer using only gene expression data. In this work, we explore the possibility of combining such learning-to-rank methods with rank aggregation techniques to learn a ranking of genes from multiple heterogeneous data sources, such as gene expression data, gene ontology data, protein-protein interaction data, etc. Rank aggregation methods have their origins in voting theory, and have been used successfully in meta-search applications to aggregate webpage rankings from different search engines. Here we use graph-based learning-to-rank methods to learn a ranking of genes from each individual data source represented as a graph, and then apply rank aggregation methods to aggregate these rankings into a single ranking over the genes. The thesis describes our approach, reports experiments with various data sets, and presents our findings and initial conclusions. Gene Prioritization Gene Ranking Bipartite Ranking Learning To Rank Rank Aggregation Methods Bipartite Instance Ranking Rank Aggregration Ranking of Genes Gene Data Sources Genes Bipartite Ranking Bipartite Graph Ranking Bioinformatics
3	Learning with Complex Performance Measures : Theory, Algorithms and Applications Narasimhan, Harikrishna January 2016 (has links) (PDF) We consider supervised learning problems, where one is given objects with labels, and the goal is to learn a model that can make accurate predictions on new objects. These problems abound in applications, ranging from medical diagnosis to information retrieval to computer vision. Examples include binary or multiclass classi cation, where the goal is to learn a model that can classify objects into two or more categories (e.g. categorizing emails into spam or non-spam); bipartite ranking, where the goal is to learn a model that can rank relevant objects above the irrelevant ones (e.g. ranking documents by relevance to a query); class probability estimation (CPE), where the goal is to predict the probability of an object belonging to different categories (e.g. probability of an internet ad being clicked by a user). In each case, the accuracy of a model is evaluated in terms of a specified `performance measure'. While there has been much work on designing and analyzing algorithms for different supervised learning tasks, we have complete understanding only for settings where the performance measure of interest is the standard 0-1 or a loss-based classification measure. These performance measures have a simple additive structure, and can be expressed as an expectation of errors on individual examples. However, in many real-world applications, the performance measure used to evaluate a model is often more complex, and does not decompose into a sum or expectation of point-wise errors. These include the binary or multiclass G-mean used in class-imbalanced classification problems; the F1-measure and its multiclass variants popular in text retrieval; and the (partial) area under the ROC curve (AUC) and precision@ employed in ranking applications. How does one design efficient learning algorithms for such complex performance measures, and can these algorithms be shown to be statistically consistent, i.e. shown to converge in the limit of infinite data to the optimal model for the given measure? How does one develop efficient learning algorithms for complex measures in online/streaming settings where the training examples need to be processed one at a time? These are questions that we seek to address in this thesis. Firstly, we consider the bipartite ranking problem with the AUC and partial AUC performance measures. We start by understanding how bipartite ranking with AUC is related to the standard 0-1 binary classification and CPE tasks. It is known that a good binary CPE model can be used to obtain both a good binary classification model and a good bipartite ranking model (formally, in terms of regret transfer bounds), and that a binary classification model does not necessarily yield a CPE model. However, not much is known about other directions. We show that in a weaker sense (where the mapping needed to transform a model from one problem to another depends on the underlying probability distribution), a good bipartite ranking model for AUC can indeed be used to construct a good binary classification model, and also a good binary CPE model. Next, motivated by the increasing number of applications (e.g. biometrics, medical diagnosis, etc.), where performance is measured, not in terms of the full AUC, but in terms of the partial AUC between two false positive rates (FPRs), we design batch algorithms for optimizing partial AUC in any given FPR range. Our algorithms optimize structural support vector machine based surrogates, which unlike for the full AUC; do not admit a straightforward decomposition into simpler terms. We develop polynomial time cutting plane solvers for solving the optimization, and provide experiments to demonstrate the efficacy of our methods. We also present an application of our approach to predicting chemotherapy outcomes for cancer patients, with the aim of improving treatment decisions. Secondly, we develop algorithms for optimizing (surrogates for) complex performance mea-sures in the presence of streaming data. A well-known method for solving this problem for standard point-wise surrogates such as the hinge surrogate, is the stochastic gradient descent (SGD) method, which performs point-wise updates using unbiased gradient estimates. How-ever, this method cannot be applied to complex objectives, as here one can no longer obtain unbiased gradient estimates from a single point. We develop a general stochastic method for optimizing complex measures that avoids point-wise updates, and instead performs gradient updates on mini-batches of incoming points. The method is shown to provably converge for any performance measure that satis es a uniform convergence requirement, such as the partial AUC, precision@ and F1-measure, and in experiments, is often several orders of magnitude faster than the state-of-the-art batch methods, while achieving similar or better accuracies. Moreover, for specific complex binary classification measures, which are concave functions of the true positive rate (TPR) and true negative rate (TNR), we are able to develop stochastic (primal-dual) methods that can indeed be implemented with point-wise updates, by using an adaptive linearization scheme. These methods admit convergence rates that match the rate of the SGD method, and are again several times faster than the state-of-the-art methods. Finally, we look at the design of consistent algorithms for complex binary and multiclass measures. For binary measures, we consider the practically popular plug-in algorithm that constructs a classifier by applying an empirical threshold to a suitable class probability estimate, and provide a general methodology for proving consistency of these methods. We apply this technique to show consistency for the F1-measure, and under a continuity assumption on the distribution, for any performance measure that is monotonic in the TPR and TNR. For the case of multiclass measures, a simple plug-in method is no longer tractable, as in the place of a single threshold parameter, one needs to tune at least as many parameters as the number of classes. Using an optimization viewpoint, we provide a framework for designing learning algorithms for multiclass measures that are general functions of the confusion matrix, and as an instantiation, provide an e cient and provably consistent algorithm based on the bisection method for multiclass measures that are ratio-of-linear functions of the confusion matrix (e.g. micro F1). The algorithm outperforms the state-of-the-art SVMPerf method in terms of both accuracy and running time. Overall, in this thesis, we have looked at various aspects of complex performance measures used in supervised learning problems, leading to several new algorithms that are often significantly better than the state-of-the-art, to improved theoretical understanding of the performance measures studied, and to novel real-life applications of the algorithms developed. Multiclass Classification Binary Classification Streaming Algorithms Complex Performance Measures Learning Algorithms Bipartite Ranking Complex Multiclass Measures AUC Complex Binary Measures Computer Science
4	An Efficient Ranking and Classification Method for Linear Functions, Kernel Functions, Decision Trees, and Ensemble Methods Glass, Jesse Miller January 2020 (has links) Structural algorithms incorporate the interdependence of outputs into the prediction, the loss, or both. Frank-Wolfe optimizations of pairwise losses and Gaussian conditional random fields for multivariate output regression are two such structural algorithms. Pairwise losses are standard 0-1 classification surrogate losses applied to pairs of features and outputs, resulting in improved ranking performance (area under the ROC curve, average precision, and F-1 score) at the cost of increased learning complexity. In this dissertation, it is proven that the balanced loss 0-1 SVM and the pairwise SVM have the same dual loss and the pairwise dual coefficient domain is a subdomain of the balanced loss 0-1 SVM with bias dual coefficient domain. This provides a theoretical advancement in the understanding of pairwise loss, which we exploit for the development of a novel ranking algorithm that is fast and memory efficient method with state the art ranking metric performance across eight benchmark data sets. Various practical advancements are also made in multivariate output regression. The learning time for Gaussian conditional random fields is greatly reduced and the parameter domain is expanded to enable repulsion between outputs. Last, a novel multivariate regression is presented that keeps the desirable elements of GCRF and infuses them into a local regression model that improves mean squared error and reduces learning complexity. / Computer and Information Science Artificial Intelligence Bipartite Ranking Frank-wolfe Algorithm Gaussian Conditional Random Fields Multivariate Output Regression Pairwise Support Vector Machine Structural Support Vector Machine
5	Bayes Optimal Feature Selection for Supervised Learning Saneem Ahmed, C G January 2014 (has links) (PDF) The problem of feature selection is critical in several areas of machine learning and data analysis such as, for example, cancer classification using gene expression data, text categorization, etc. In this work, we consider feature selection for supervised learning problems, where one wishes to select a small set of features that facilitate learning a good prediction model in the reduced feature space. Our interest is primarily in filter methods that select features independently of the learning algorithm to be used and are generally faster to implement compared to other types of feature selection algorithms. Many common filter methods for feature selection make use of information-theoretic criteria such as those based on mutual information to guide their search process. However, even in simple binary classification problems, mutual information based methods do not always select the best set of features in terms of the Bayes error. In this thesis, we develop a general approach for selecting a set of features that directly aims to minimize the Bayes error in the reduced feature space with respect to the loss or performance measure of interest. We show that the mutual information based criterion is a special case of our setting when the loss function of interest is the logarithmic loss for class probability estimation. We give a greedy forward algorithm for approximately optimizing this criterion and demonstrate its application to several supervised learning problems including binary classification (with 0-1 error, cost-sensitive error, and F-measure), binary class probability estimation (with logarithmic loss), bipartite ranking (with pairwise disagreement loss), and multiclass classification (with multiclass 0-1 error). Our experiments suggest that the proposed approach is competitive with several state-of-the art methods. Data Analysis Logarithms Supervised Learning Bayes Optimality Binary Classsification Bipartite Ranking Multiclass Classification Bayes Optimal Feature Selection Optimal Feature Selection Bayes Error Binary Class Probability Estimation Supervised Learning Problems Computer Science

1

Page generated in 0.0792 seconds