Global ETD Search

1031	Predictive Gaussian Classification of Functional MRI Data Yourganov, Grigori 14 January 2014 (has links) This thesis presents an evaluation of algorithms for classification of functional MRI data. We evaluated the performance of probabilistic classifiers that use a Gaussian model against a popular non-probabilistic classifier (support vector machine, SVM). A pool of classifiers consisting of linear and quadratic discriminants, linear and non-linear Gaussian Naive Bayes (GNB) classifiers, and linear SVM, was evaluated on several sets of real and simulated fMRI data. Performance was measured using two complimentary metrics: accuracy of classification of fMRI volumes within a subject, and reproducibility of within-subject spatial maps; both metrics were computed using split-half resampling. Regularization parameters of multivariate methods were tuned to optimize the out-of-sample classification and/or within-subject map reproducibility. SVM showed no advantage in classification accuracy over Gaussian classifiers. Performance of SVM was matched by linear discriminant, and at times outperformed by quadratic discriminant or nonlinear GNB. Among all tested methods, linear and quadratic discriminants regularized with principal components analysis (PCA) produced spatial maps with highest within-subject reproducibility. We also demonstrated that the number of principal components that optimizes the performance of linear / quadratic discriminants is sensitive to the mean magnitude, variability and connectivity of simulated active signal. In real fMRI data, this number is correlated with behavioural measures of post-stroke recovery , and, in a separate study, with behavioural measures of self-control. Using the data from a study of cognitive aspects of aging, we accurately predicted the age group of the subject from within-subject spatial maps created by our pool of classifiers. We examined the cortical areas that showed difference in recruitment in young versus older subjects; this difference was demonstrated to be primarily driven by more prominent recruitment of task-positive network in older subjects. We conclude that linear and quadratic discriminants with PCA regularization are well-suited for fMRI data classification, particularly for within-subject analysis. fMRI data analysis principal components analysis machine learning 0541
1032	Learning predictive models from graph data using pattern mining Karunaratne, Thashmee M. January 2014 (has links) Learning from graphs has become a popular research area due to the ubiquity of graph data representing web pages, molecules, social networks, protein interaction networks etc. However, standard graph learning approaches are often challenged by the computational cost involved in the learning process, due to the richness of the representation. Attempts made to improve their efficiency are often associated with the risk of degrading the performance of the predictive models, creating tradeoffs between the efficiency and effectiveness of the learning. Such a situation is analogous to an optimization problem with two objectives, efficiency and effectiveness, where improving one objective without the other objective being worse off is a better solution, called a Pareto improvement. In this thesis, it is investigated how to improve the efficiency and effectiveness of learning from graph data using pattern mining methods. Two objectives are set where one concerns how to improve the efficiency of pattern mining without reducing the predictive performance of the learning models, and the other objective concerns how to improve predictive performance without increasing the complexity of pattern mining. The employed research method mainly follows a design science approach, including the development and evaluation of artifacts. The contributions of this thesis include a data representation language that can be characterized as a form in between sequences and itemsets, where the graph information is embedded within items. Several studies, each of which look for Pareto improvements in efficiency and effectiveness are conducted using sets of small graphs. Summarizing the findings, some of the proposed methods, namely maximal frequent itemset mining and constraint based itemset mining, result in a dramatically increased efficiency of learning, without decreasing the predictive performance of the resulting models. It is also shown that additional background knowledge can be used to enhance the performance of the predictive models, without increasing the complexity of the graphs. Machine Learning Graph Data Pattern Mining Classification Regression Predictive Models
1033	Missing Data Problems in Machine Learning Marlin, Benjamin 01 August 2008 (has links) Learning, inference, and prediction in the presence of missing data are pervasive problems in machine learning and statistical data analysis. This thesis focuses on the problems of collaborative prediction with non-random missing data and classification with missing features. We begin by presenting and elaborating on the theory of missing data due to Little and Rubin. We place a particular emphasis on the missing at random assumption in the multivariate setting with arbitrary patterns of missing data. We derive inference and prediction methods in the presence of random missing data for a variety of probabilistic models including finite mixture models, Dirichlet process mixture models, and factor analysis. Based on this foundation, we develop several novel models and inference procedures for both the collaborative prediction problem and the problem of classification with missing features. We develop models and methods for collaborative prediction with non-random missing data by combining standard models for complete data with models of the missing data process. Using a novel recommender system data set and experimental protocol, we show that each proposed method achieves a substantial increase in rating prediction performance compared to models that assume missing ratings are missing at random. We describe several strategies for classification with missing features including the use of generative classifiers, and the combination of standard discriminative classifiers with single imputation, multiple imputation, classification in subspaces, and an approach based on modifying the classifier input representation to include response indicators. Results on real and synthetic data sets show that in some cases performance gains over baseline methods can be achieved by methods that do not learn a detailed model of the feature space. Computer Science Artificial Intelligence Machine Learning Missing Data 0800
1034	Missing Data Problems in Machine Learning Marlin, Benjamin 01 August 2008 (has links) Learning, inference, and prediction in the presence of missing data are pervasive problems in machine learning and statistical data analysis. This thesis focuses on the problems of collaborative prediction with non-random missing data and classification with missing features. We begin by presenting and elaborating on the theory of missing data due to Little and Rubin. We place a particular emphasis on the missing at random assumption in the multivariate setting with arbitrary patterns of missing data. We derive inference and prediction methods in the presence of random missing data for a variety of probabilistic models including finite mixture models, Dirichlet process mixture models, and factor analysis. Based on this foundation, we develop several novel models and inference procedures for both the collaborative prediction problem and the problem of classification with missing features. We develop models and methods for collaborative prediction with non-random missing data by combining standard models for complete data with models of the missing data process. Using a novel recommender system data set and experimental protocol, we show that each proposed method achieves a substantial increase in rating prediction performance compared to models that assume missing ratings are missing at random. We describe several strategies for classification with missing features including the use of generative classifiers, and the combination of standard discriminative classifiers with single imputation, multiple imputation, classification in subspaces, and an approach based on modifying the classifier input representation to include response indicators. Results on real and synthetic data sets show that in some cases performance gains over baseline methods can be achieved by methods that do not learn a detailed model of the feature space. Computer Science Artificial Intelligence Machine Learning Missing Data 0800
1035	Evaluating Information Retrieval Systems With Multiple Non-Expert Assessors Li, Le January 2013 (has links) Many current test collections require the use of expert judgments during construction. The true label of each document is given by an expert assessor. However, the cost and effort associated with expert training and judging are typically quite high in the event where we have a high number of documents to judge. One way to address this issue is to have each document judged by multiple non-expert assessors at a lower expense. However, there are two key factors that can make this method difficult: the variability across assessors' judging abilities, and the aggregation of the noisy labels into one single consensus label. Much previous work has shown how to utilize this method to replace expert labels in the relevance evaluation. However, the effects of relevance judgment errors on the ranking system evaluation have been less explored. This thesis mainly investigates how to best evaluate information retrieval systems with noisy labels, where no ground-truth labels are provided, and where each document may receive multiple noisy labels. Based on our simulation results on two datasets, we find that conservative assessors that tend to label incoming documents as non-relevant are preferable. And there are two important factors affect the overall conservativeness of the consensus labels: the assessor's conservativeness and the relevance standard. This important observation essentially provides a guideline on what kind of consensus algorithms or assessors are needed in order to preserve the high correlation with expert labels in ranking system evaluation. Also, we systematically investigate how to find the consensus labels for those documents with equal confidence to be either relevant or non-relevant. We investigate a content-based consensus algorithm which links the noisy labels with document content. We compare it against the state-of-art consensus algorithms, and find that, depending on the document collection, this content-based approach may help or hurt the performance. Computer Science Information Retrieval Machine Learning Crowdsourcing Computer Science
1036	Learning with non-Standard Supervision Urner, Ruth January 2013 (has links) Machine learning has enjoyed astounding practical success in a wide range of applications in recent years-practical success that often hurries ahead of our theoretical understanding. The standard framework for machine learning theory assumes full supervision, that is, training data consists of correctly labeled iid examples from the same task that the learned classifier is supposed to be applied to. However, many practical applications successfully make use of the sheer abundance of data that is currently produced. Such data may not be labeled or may be collected from various sources. The focus of this thesis is to provide theoretical analysis of machine learning regimes where the learner is given such (possibly large amounts) of non-perfect training data. In particular, we investigate the benefits and limitations of learning with unlabeled data in semi-supervised learning and active learning as well as benefits and limitations of learning from data that has been generated by a task that is different from the target task (domain adaptation learning). For all three settings, we propose Probabilistic Lipschitzness to model the relatedness between the labels and the underlying domain space, and we discuss our suggested notion by comparing it to other common data assumptions. Machine learning theory Sample complexity Unlabeled data Computer Science
1037	A Symbiotic Bid-Based Framework for Problem Decomposition using Genetic Programming Lichodzijewski, Peter 22 February 2011 (has links) This thesis investigates the use of symbiosis as an evolutionary metaphor for problem decomposition using Genetic Programming. It begins by drawing a connection between lateral problem decomposition, in which peers with similar capabilities coordinate their actions, and vertical problem decomposition, whereby solution subcomponents are organized into increasingly complex units of organization. Furthermore, the two types of problem decomposition are associated respectively with context learning and layered learning. The thesis then proposes the Symbiotic Bid-Based framework modeled after a three-staged process of symbiosis abstracted from biological evolution. As such, it is argued, the approach has the capacity for both types of problem decomposition. Three principles capture the essence of the proposed framework. First, a bid-based approach to context learning is used to separate the issues of `what to do' and `when to do it'. Whereas the former issue refers to the problem-specific actions, e.g., class label predictions, the latter refers to a bidding behaviour that identifies a set of problem conditions. In this work, Genetic Programming is used to evolve the bids casting the method in a non-traditional role as programs no longer represent complete solutions. Second, the proposed framework relies on symbiosis as the primary mechanism of inheritance driving evolution, where this is in contrast to the crossover operator often encountered in Evolutionary Computation. Under this evolutionary metaphor, a set of symbionts, each representing a solution subcomponent in terms of a bid-action pair, is compartmentalized inside a host. Communication between symbionts is realized through their collective bidding behaviour, thus, their cooperation is directly supported by the bid-based approach to context learning. Third, assuming that challenging tasks where problem decomposition is likely to play a key role will often involve large state spaces, the proposed framework includes a dynamic evaluation function that explicitly models the interaction between candidate solutions and training cases. As such, the computational overhead incurred during training under the proposed framework does not depend on the size of the problem state space. An approach to model building, the Symbiotic Bid-Based framework is first evaluated on a set of real-world classification problems which include problems with multi-class labels, unbalanced distributions, and large attribute counts. The evaluation includes a comparison against Support Vector Machines and AdaBoost. Under temporal sequence learning, the proposed framework is evaluated on the truck reversal and Rubik's Cube tasks, and in the former case, it is compared with the Neuroevolution of Augmenting Topologies algorithm. Under both problems, it is demonstrated that the increased capacity for problem decomposition under the proposed approach results in improved performance, with solutions employing vertical problem decomposition under temporal sequence learning proving to be especially effective. Genetic Programming Problem Decomposition Symbiosis Coevolution Machine Learning
1038	Hybrid Tag Recommendation in Collaborative Tagging Systems Lipczak, Marek 15 March 2012 (has links) The simplicity and flexibility of tagging allows users to collaboratively create large, loosely structured repositories of Web resources. One of its main drawbacks is the need for manual formulation of tags for each posted resource. This task can be eased by a tag recommendation system, the objective of which is to propose a set of tags for a given resource, user pair. Tag recommendation is an interesting and well-defined practical problem. Its main features are constant interaction with users and availability of large amounts of tagged data. Given the opportunities (e.g., rich user feedback) and limitations (e.g., real-time response) of the tag recommendation setting, we defined six requirements for a practically useful tag recommendation system. We present a conceptual design and system architecture of a hybrid tag recommendation system, which meets all these requirements. The system utilizes the strengths of various tag sources (e.g., resource content and user profiles) and the relations between concepts captured in tag co-occurrence graphs mined from collaborative actions of users. The architecture of the proposed system is based on a text indexing engine, which allows the system to deal with large datasets in real time, while constantly adapting its models to newly added posts. The effectiveness and efficiency of the system was evaluated for six datasets representing a broad range of collaborative tagging systems. The experiments confirmed the high quality of results and practical usability of the system. In a comparative study the system outperformed a state-of-the-art algorithm based on tensor factorization for the most representative datasets applicable to both methods. The experiments on the characteristics of tagging data and the performance of the system allowed us to find answers to important research questions adapted from the general area of recommender systems. We confirmed the importance of infrequently used tags in the recommendation process and proposed solutions to overcome the cold start problem in tag recommendation. We demonstrated that a parameter tuning approach makes a hybrid tag recommendation system adaptable to various datasets. We also revealed the importance of the utilization of a feedback loop in the tag recommendation process. tag recommendation machine learning recommendation collaborative tagging folksonomies hybrid systems
1039	An Investigation of Using Machine Learning with Distribution Based Flow Features for Classifying SSL Encrypted Network Traffic Arndt, Daniel Joseph 13 August 2012 (has links) Encrypted protocols, such as Secure Socket Layer (SSL), are becoming more prevalent because of the growing use of e-commerce, anonymity services, gaming and Peer-to-Peer (P2P) applications such as Skype and Gtalk. The objective of this work is two-fold. First, an investigation is provided into the identification of web browsing behaviour in SSL tunnels. To this end, C5.0, naive Bayesian, AdaBoost and Genetic Programming learning models are evaluated under training and test conditions from a network traffic capture. In these experiments flow based features are employed without using Internet Protocol (IP) addresses, source/destination ports or payload information. Results indicate that it is possible to identify web browsing behaviour in SSL encrypted tunnels. Test performance of ~95% detection rate and ~2% false positive rate is achieved with a C5.0 model for identifying SSL. ~98% detection rate and ~3% false positive rate is achieved with an AdaBoost model for identifying web browsing within these tunnels. Second, the identifying characteristics of SSL traffic are investigated, whereby a new tool is introduced to generate new flow statistics that focus on presenting the features in a unique way, using bins to represent distributions of measurements. These new features are tested using the best performers from previous experiments, C5.0 and AdaBoost, and increase detection rates by up to 32.40%, and lower false positive rates by as much as 54.73% on data sets that contain traffic from a different network than the training set was captured on. Furthermore, the new feature set out-preforms the old feature set in every case. encryption network security machine learning SSL secure socket layer
1040	Subgraph Methods for Comparing Complex Networks Hurshman, Matthew 03 April 2013 (has links) An increasing number of models have been proposed to explain the link structure observed in complex networks. The central problem addressed in this thesis is: how do we select the best model? The model-selection method we implement is based on supervised learning. We train a classifier on six complex network models incorporating various link attachment mechanisms, including preferential attachment, copying and spatial. For the classification we represent graphs as feature vectors, integrating common complex network statistics with raw counts of small connected subgraphs commonly referred to as graphlets. The outcome of each experiment strongly indicates that models which incorporate the preferential attachment mechanism fit the network structure of Facebook the best. The experiments also suggest that graphlet structure is better at distinguishing different network models than more traditional complex network statistics. To further the understanding of our experimental results, we compute the expected number of triangles, 3-paths and 4-cycles which appear in our selected models. This analysis shows that the spatial preferential attachment model generates 3-paths, triangles and 4-cycles in abundance, giving a closer match to the observed network structure of the Facebook networks used in our model selection experiment. The other models generate some of these subgraphs in abundance but not all three at once. In general, we show that our selected models generate vastly different amounts of triangles, 3-paths and 4-cycles, verifying our experimental conclusion that graphlets are distinguishing features of these complex network models. Social Networks

Search results