981 |
The Power Landmark Vector Learning FrameworkXiang, Shuo 07 May 2008 (has links)
Kernel methods have recently become popular in bioinformatics machine learning. Kernel methods allow linear algorithms to be applied to non-linear learning situations. By using kernels, non-linear learning problems can benefit from the statistical and runtime stability traditionally enjoyed by linear learning problems. However, traditional kernel learning frameworks use implicit feature spaces whose mathematical properties were hard to characterize. In order to address this problem, recent research has proposed a vector learning framework that uses landmark vectors which are unlabeled vectors belonging to the same distribution and the same input space as the training vectors. This thesis introduces an extension to the landmark vector learning framework that allows it to utilize two new classes of landmark vectors in the input space. This augmented learning framework is named the power landmark vector learning framework. A theoretical description of the power landmark vector learning framework is given along with proofs of new theoretical results. Experimental results show that the performance of the power landmark vector learning framework is comparable to traditional kernel learning frameworks.
|
982 |
Smart assistants for smart homesRasch, Katharina January 2013 (has links)
The smarter homes of tomorrow promise to increase comfort, aid elderly and disabled people, and help inhabitants save energy. Unfortunately, smart homes today are far from this vision – people who already live in such a home struggle with complicated user interfaces, inflexible home configurations, and difficult installation procedures. Under these circumstances, smart homes are not ready for mass adoption. This dissertation addresses these issues by proposing two smart assistants for smart homes. The first assistant is a recommender system that suggests useful services (i.e actions that the home can perform for the user). The recommended services are fitted to the user’s current situation, habits, and preferences. With these recommendations it is possible to build much simpler user interfaces that highlight the most interesting choices currently available. Configuration becomes much more flexible: since the recommender system automatically learns user habits, user routines no longer have to be manually described. Evaluations with two smart home datasets show that the correct service is included in the top five recommendations in 90% of all cases. The second assistant addresses the difficult installation procedures. The unique feature of this assistant is that it removes the need for manually describing device functionalities (such descriptions are needed by the recommender system). Instead, users can simply plug in a new device, begin using it, and let the installation assistant identify what the device is doing. The installation assistant has minimal requirements for manufacturers of smart home devices and was successfully integrated with an existing smart home. Evaluations performed with this smart home show that the assistant can quickly and reliably learn device functionalities. / <p>QC 20130924</p>
|
983 |
Learning with non-Standard SupervisionUrner, Ruth January 2013 (has links)
Machine learning has enjoyed astounding practical
success in a wide range of applications in recent
years-practical success that often hurries ahead of our
theoretical understanding. The standard framework for machine
learning theory assumes full supervision, that is, training data
consists of correctly labeled iid examples from the same task
that the learned classifier is supposed to be applied to.
However, many practical applications successfully make use of
the sheer abundance of data that is currently produced. Such
data may not be labeled or may be collected from various
sources.
The focus of this thesis is to provide theoretical analysis of
machine learning regimes where the learner is given such
(possibly large amounts) of non-perfect training data. In
particular, we investigate the benefits and limitations of
learning with unlabeled data in semi-supervised learning and
active learning as well as benefits and limitations of learning
from data that has been generated by a task that is different
from the target task (domain adaptation learning).
For all three settings, we propose
Probabilistic Lipschitzness to model the relatedness between the labels and the underlying domain space, and we
discuss our suggested notion by comparing it to other common
data assumptions.
|
984 |
Autonomous Path Following Using Convolutional NetworksSchmiterlöw, Maria January 2012 (has links)
Autonomous vehicles have many application possibilities within many different fields like rescue missions, exploring foreign environments or unmanned vehicles etc. For such system to navigate in a safe manner, high requirements of reliability and security must be fulfilled. This master's thesis explores the possibility to use the machine learning algorithm convolutional network on a robotic platform for autonomous path following. The only input to predict the steering signal is a monochromatic image taken by a camera mounted on the robotic car pointing in the steering direction. The convolutional network will learn from demonstrations in a supervised manner. In this thesis three different preprocessing options are evaluated. The evaluation is based on the quadratic error and the number of correctly predicted classes. The results show that the convolutional network has no problem of learning a correct behaviour and scores good result when evaluated on similar data that it has been trained on. The results also show that the preprocessing options are not enough to ensure that the system is environment dependent.
|
985 |
Building a standard operating procedure for the analysis of mass spectrometry dataMalmqvist, Niklas January 2012 (has links)
Mass spectrometry (MS) is used in peptidomics to find novel endogenous peptides that may lead to the discovery of new biomarkers. Identifying endogenous peptides from MS is a time-consuming and challenging task; storing identified peptides in a database and comparing them against unknown peptides from other MS runs avoids re-doing identification. MS produce large amounts of data, making interpretation difficult. A platform for helping the identification of endogenous peptides was developed in this project, including a library application for storing peptide data. Machine learning methods were also used to try to find patterns in peptide abundance that could be correlated to a specific sample or treatment type, which can help focus the identification work on peptides of high interest.
|
986 |
Learning Instruction Scheduling Heuristics from Optimal DataRussell, Tyrel January 2006 (has links)
The development of modern pipelined and multiple functional unit processors has increased the available instruction level parallelism. In order to fully utilize these resources, compiler writers spend large amounts of time developing complex scheduling heuristics for each new architecture. In order to reduce the time spent on this process, automated machine learning techniques have been proposed to generate scheduling heuristics. We present two case studies using these techniques to generate instruction scheduling heuristics for basic blocks and super blocks. A basic block is a block of code with a single flow of control and a super block is a collection of basic blocks with a single entry point but multiple exit points. We improve previous techniques for automated generation of basic block scheduling heuristics by increasing the quality of the training data and increasing the number of features considered, including several novel features that have useful effects on scheduling instructions. Our case study into super block scheduling heuristics is a novel contribution as previous approaches were only applied to basic blocks. We show through experimentation that we can produce efficient heuristics that perform better than current heuristic methods for basic block and super block scheduling. We show that we can reduce the number of non-optimally scheduled blocks by up to 55% for basic blocks and 38% for super blocks. We also show that we can produce better schedules 7. 8 times more often than the next best heuristic for basic blocks and 4. 4 times more often for super blocks.
|
987 |
Convex Large Margin Training - Unsupervised, Semi-supervised, and Robust Support Vector MachinesXu, Linli January 2007 (has links)
Support vector machines (SVMs) have been a dominant machine learning technique for more than a decade. The intuitive principle
behind SVM training is to find the maximum margin separating hyperplane for a given set of binary labeled training data. Previously, SVMs have been primarily applied to supervised learning problems, where target class labels are provided with the
data. Developing unsupervised extensions to SVMs, where no class labels are given, turns out to be a challenging problem. In
this dissertation, I propose a principled approach for unsupervised and semi-supervised SVM training by formulating
convex relaxations of the natural training criterion: find a (constrained) labeling that would yield an optimal SVM classifier
on the resulting labeled training data. This relaxation yields a semidefinite program (SDP) that can be solved in polynomial time.
The resulting training procedures can be applied to two-class and multi-class problems, and ultimately to the multivariate case, achieving high quality results in each case. In addition to unsupervised training, I also consider the problem of reducing the outlier sensitivity of standard supervised SVM training. Here I show that a similar convex relaxation can be applied to improve the robustness of SVMs by explicitly
suppressing outliers in the training process. The proposed approach can achieve superior results to standard SVMs in the
presence of outliers.
|
988 |
The Power Landmark Vector Learning FrameworkXiang, Shuo 07 May 2008 (has links)
Kernel methods have recently become popular in bioinformatics machine learning. Kernel methods allow linear algorithms to be applied to non-linear learning situations. By using kernels, non-linear learning problems can benefit from the statistical and runtime stability traditionally enjoyed by linear learning problems. However, traditional kernel learning frameworks use implicit feature spaces whose mathematical properties were hard to characterize. In order to address this problem, recent research has proposed a vector learning framework that uses landmark vectors which are unlabeled vectors belonging to the same distribution and the same input space as the training vectors. This thesis introduces an extension to the landmark vector learning framework that allows it to utilize two new classes of landmark vectors in the input space. This augmented learning framework is named the power landmark vector learning framework. A theoretical description of the power landmark vector learning framework is given along with proofs of new theoretical results. Experimental results show that the performance of the power landmark vector learning framework is comparable to traditional kernel learning frameworks.
|
989 |
Solving a mixed-integer programming formulation of a classification model with misclassification limitsBrooks, J. Paul 25 August 2005 (has links)
Classification, the development of rules for the allocation of observations to one or more groups, is a fundamental problem in machine learning and has been applied to many problems in medicine and business. We consider aspects of a classification model developed by Gallagher, Lee, and Patterson that is based on a result by Anderson. The model seeks to maximize the probability of correct G-group classification, subject to limits on misclassification probabilities. The mixed-integer programming formulation of the model is an empirical method for estimating the parameters of an optimal classification rule, which are identified as coefficients of linear functions by Anderson.
The model is shown to be a consistent method for estimating the parameters of the optimal solution to the problem of maximizing the probability of correct classification subject to limits on inter-group misclassification probabilities. A polynomial time algorithm is described for two-group instances. The method is NP-complete for a general number of groups, and an approximation is formulated as a mixed-integer program (MIP). The MIP is difficult to solve due to the formulation of constraints wherein certain variables are equal to the maximum of a set of linear functions. These constraints are conducive to an ill-conditioned coefficient matrix. Methods for generating edges of the conflict graph and conflict hypergraphs are discussed. The conflict graph is employed for finding cuts in a branch-and-bound framework. This technique and others lead to improvement in solution time over industry-standard software on instances generated by real-world data. The classification accuracy of the model in relation to standard classification methods on real-world and simulated data is also noted.
|
990 |
Modeling and Predicting Software BehaviorsBowring, James Frederick 11 August 2006 (has links)
Software systems will eventually contribute to their own maintenance using implementations of self-awareness. Understanding how to specify, model, and implement software with a sense of self is a daunting problem. This research draws inspiration from the automatic functioning of a gimbal---a self-righting mechanical device that supports an object and maintains the orientation of this object with respect to gravity independently of its immediate operating environment. A software gimbal exhibits a self-righting feature that provisions software with two auxiliary mechanisms: a historical mechanism and a reflective mechanism. The historical mechanism consists of behavior classifiers trained on statistical models of data that are collected from executions of the program that exhibit known behaviors of the program. The reflective mechanism uses the historical mechanism to assess an ongoing or selected execution.
This dissertation presents techniques for the identification and modeling of program execution features as statistical models. It further demonstrates how statistical machine-learning techniques can be used to manipulate these models and to construct behavior classifiers that can automatically detect and label known program behaviors and detect new unknown behaviors. The thesis is that statistical summaries of data collected from a software program's executions can model and predict external behaviors of the program.
This dissertation presents three control-flow features and one value-flow feature of program executions that can be modeled as stochastic processes exhibiting the Markov property. A technique for building automated behavior classifiers from these models is detailed. Empirical studies demonstrating the efficacy of this approach are presented. The use of these techniques in example software engineering applications in the categories of software testing and failure detection are described.
|
Page generated in 0.1811 seconds