361 |
Pre-processing of tandem mass spectra using machine learning methodsDing, Jiarui 27 May 2009
Protein identification has been more helpful than before in the diagnosis and treatment of many diseases, such as cancer, heart disease and HIV. Tandem mass spectrometry is a powerful tool for protein identification. In a typical experiment, proteins are broken into small amino acid oligomers called peptides. By determining the amino acid sequence of several peptides of a protein, its whole amino acid sequence can be inferred. Therefore, peptide identification is the first step and a central issue for protein identification. Tandem mass spectrometers can produce a large number of tandem mass spectra which are used for peptide identification. Two issues should be addressed to improve the performance of current peptide identification algorithms. Firstly, nearly all spectra are noise-contaminated. As a result, the accuracy of peptide identification algorithms may suffer from the noise in spectra. Secondly, the majority of spectra are not identifiable because they are of too poor quality. Therefore, much time is wasted attempting to identify these unidentifiable spectra.<p>
The goal of this research is to design spectrum pre-processing algorithms to both speedup and improve the reliability of peptide identification from tandem mass spectra. Firstly, as a tandem mass spectrum is a one dimensional signal consisting of dozens to hundreds of peaks, and majority of peaks are noisy peaks, a spectrum denoising algorithm is proposed to remove most noisy peaks of spectra. Experimental results show that our denoising algorithm can remove about 69% of peaks which are potential noisy peaks among a spectrum. At the same time, the number of spectra that can be identified by Mascot algorithm increases by 31% and 14% for two tandem mass spectrum datasets. Next, a two-stage recursive feature elimination based on support vector machines (SVM-RFE) and a sparse logistic regression method are proposed to select the most relevant features to describe the quality of tandem mass spectra. Our methods can effectively select the most relevant features in terms of performance of classifiers trained with the different number of features. Thirdly, both supervised and unsupervised machine learning methods are used for the quality assessment of tandem mass spectra. A supervised classifier, (a support vector machine) can be trained to remove more than 90% of poor quality spectra without removing more than 10% of high quality spectra. Clustering methods such as model-based clustering are also used for quality assessment to cancel the need for a labeled training dataset and show promising results.
|
362 |
Metareasoning about propagators for constraint satisfactionThompson, Craig Daniel Stewart 11 July 2011
Given the breadth of constraint satisfaction problems (CSPs) and the wide variety of CSP solvers, it is often very difficult to determine a priori which solving method is best suited to a problem. This work explores the use of machine learning to predict which solving method will be most effective for a given problem. We use four different problem sets to determine the CSP attributes that can be used to determine which solving method should be applied. After choosing an appropriate set of attributes, we determine how well j48 decision trees can predict which solving method to apply. Furthermore, we take a cost sensitive approach such that problem instances where there is a great difference in runtime between algorithms are emphasized. We also attempt to use information gained on one class of problems to inform decisions about a second class of problems. Finally, we show that the additional costs of deciding which method to apply are outweighed by the time savings compared to applying the same solving method to all problem instances.
|
363 |
Fatigue effect on task performance in haptic virtual environment for home-based rehabilitationYang, Chun 11 July 2011
Stroke rehabilitation is to train the motor function of a patients limb. In this process, functional assessment is of importance, and it is primarily based on a patients task performance. The context of the rehabilitation discussed in this thesis is such that functional assessment is conducted through a computer system and the Internet. In particular, a patient performs the task at home in a haptic virtual environment, and the task performance is transmitted to the therapist over the Internet. One problem with this approach to functional assessment is that a patients mind state is little known to the therapist. This immediately leads to one question, that is, whether an elevated mind state will have some significant effect on the patients task performance? If so, this approach can result in a considerable error.
The overall objective of this thesis study was to generate an answer to the aforementioned question. The study focused on a patients elevated fatigue state. The specific objectives of the study include: (i) developing a haptic virtual environment prototype system for functional assessment, (ii) developing a physiological-based inference system for fatigue state, and (iii) performing an experiment to generate knowledge regarding the fatigue effect on task performance. With a limited resource in recruiting patients in the experiment, the study conducted few experiments on patients but mostly on healthy subjects.
The study has concluded: (1) the proposed haptic virtual environment system is effective for the wrist coordination task and is likely promising to other tasks, (2) the accuracy of proposed fatigue inference system achieves 89.54%, for two levels of fatigue state, which is promising, (3) the elevated fatigue state significantly affects task performance in the context of wrist coordination task, and (4) the accuracy of the individual-based inference approach is significantly higher than that of the group-based inference approach.
The main contributions of the thesis are (1) generation of the new knowledge regarding the fatigue effect on task performance in the context of home-based rehabilitation, (2) provision of the new fatigue inference system with the highest accuracy in comparison with the existing approaches in literature, and (3) generation of the new knowledge regarding the difference between the individual-based inference and group-based inference approaches.
|
364 |
A study on machine learning algorithms for fall detection and movement classificationRalhan, Amitoz Singh 04 January 2010
Fall among the elderly is an important health issue. Fall detection and movement tracking techniques are therefore instrumental in dealing with this issue. This thesis responds to the
challenge of classifying different movement types as a part of a system designed to fulfill the need for a wearable device to collect data for fall and near-fall analysis. Four different fall activities (forward, backward, left and right),
three normal activities (standing, walking and lying down) and near-fall situations are identified and detected. Different machine
learning algorithms are compared and the best one is used for the real time classification. The comparison is made using Waikato Environment for Knowledge Analysis or in short WEKA. The system also has the ability to adapt to different gaits of different people. A feature selection algorithm is also introduced to reduce the number
of features required for the classification problem.
|
365 |
Investigation of automatic construction of reactive controllersWesterberg, Caryl J. 21 May 1993 (has links)
In real-time control systems, the value of a control decision depends not
only on the correctness of the decision but also on the time when that decision
is available. Recent work in real-time decision making has used machine learning
techniques to automatically construct reactive controllers, that is, controllers
with little or no internal state and low time complexity pathways between sensors
and effectors. This paper presents research on 1) how a problem representation
affects the trade-offs between space and performance, and 2) off -line versus on-line
approaches for collecting training examples when using machine learning techniques
to construct reactive controllers. Empirical results show that for a partially
observable problem both the inclusion of history information in the problem representation
and the use of on-line rather than off -line learning can improve the
performance of the reactive controller. / Graduation date: 1994
|
366 |
Machine Learning Approaches to Biological Sequence and Phenotype Data AnalysisMin, Renqiang 17 February 2011 (has links)
To understand biology at a system level, I presented novel machine learning algorithms to reveal the underlying mechanisms of how genes and their products function in different biological levels in this thesis. Specifically, at sequence level, based on Kernel Support Vector Machines (SVMs), I proposed learned random-walk kernel and learned empirical-map kernel to identify protein remote homology solely based on sequence data, and I proposed a discriminative motif discovery algorithm to identify sequence motifs that characterize
protein sequences' remote homology membership. The proposed approaches significantly outperform previous methods, especially on some challenging protein families. At expression and protein level, using hierarchical Bayesian graphical models, I developed the first high-throughput computational predictive model to filter sequence-based predictions of microRNA targets by incorporating the
proteomic data of putative microRNA target genes, and I proposed another probabilistic model to explore the underlying mechanisms of microRNA regulation by combining the expression profile data of messenger RNAs and microRNAs. At cellular level, I further investigated how yeast genes manifest their
functions in cell morphology by performing gene function prediction from the morphology data of yeast temperature-sensitive alleles. The developed prediction models enable biologists to choose some interesting yeast
essential genes and study their predicted novel functions.
|
367 |
Cooperative and intelligent control of multi-robot systems using machine learningWang, Ying 05 1900 (has links)
This thesis investigates cooperative and intelligent control of autonomous multi-robot systems in a dynamic, unstructured and unknown environment and makes significant original contributions with regard to self-deterministic learning for robot cooperation, evolutionary optimization of robotic actions, improvement of system robustness, vision-based object tracking, and real-time performance.
A distributed multi-robot architecture is developed which will facilitate operation of a cooperative multi-robot system in a dynamic and unknown environment in a self-improving, robust, and real-time manner. It is a fully distributed and hierarchical architecture with three levels. By combining several popular AI, soft computing, and control techniques such as learning, planning, reactive paradigm, optimization, and hybrid control, the developed architecture is expected to facilitate effective autonomous operation of cooperative multi-robot systems in a dynamically changing, unknown, and unstructured environment.
A machine learning technique is incorporated into the developed multi-robot system for self-deterministic and self-improving cooperation and coping with uncertainties in the environment. A modified Q-learning algorithm termed Sequential Q-learning with Kalman Filtering (SQKF) is developed in the thesis, which can provide fast multi-robot learning. By arranging the robots to learn according to a predefined sequence, modeling the effect of the actions of other robots in the work environment as Gaussian white noise and estimating this noise online with a Kalman filter, the SQKF algorithm seeks to solve several key problems in multi-robot learning.
As a part of low-level sensing and control in the proposed multi-robot architecture, a fast computer vision algorithm for color-blob tracking is developed to track multiple moving objects in the environment. By removing the brightness and saturation information in an image and filtering unrelated information based on statistical features and domain knowledge, the algorithm solves the problems of uneven illumination in the environment and improves real-time performance.
In order to validate the developed approaches, a Java-based simulation system and a physical multi-robot experimental system are developed to successfully transport an object of interest to a goal location in a dynamic and unknown environment with complex obstacle distribution. The developed approaches in this thesis are implemented in the prototype system and rigorously tested and validated through computer simulation and experimentation.
|
368 |
Classification of leakage detections acquired by airborne thermography of district heating networksBerg, Amanda January 2013 (has links)
In Sweden and many other northern countries, it is common for heat to be distributed to homes and industries through district heating networks. Such networks consist of pipes buried underground carrying hot water or steam with temperatures in the range of 90-150 C. Due to bad insulation or cracks, heat or water leakages might appear. A system for large-scale monitoring of district heating networks through remote thermography has been developed and is in use at the company Termisk Systemteknik AB. Infrared images are captured from an aircraft and analysed, finding and indicating the areas for which the ground temperature is higher than normal. During the analysis there are, however, many other warm areas than true water or energy leakages that are marked as detections. Objects or phenomena that can cause false alarms are those who, for some reason, are warmer than their surroundings, for example, chimneys, cars and heat leakages from buildings. During the last couple of years, the system has been used in a number of cities. Therefore, there exists a fair amount of examples of different types of detections. The purpose of the present master’s thesis is to evaluate the reduction of false alarms of the existing analysis that can be achieved with the use of a learning system, i.e. a system which can learn how to recognize different types of detections. A labelled data set for training and testing was acquired by contact with customers. Furthermore, a number of features describing the intensity difference within the detection, its shape and propagation as well as proximity information were found, implemented and evaluated. Finally, four different classifiers and other methods for classification were evaluated. The method that obtained the best results consists of two steps. In the initial step, all detections which lie on top of a building are removed from the data set of labelled detections. The second step consists of classification using a Random forest classifier. Using this two-step method, the number of false alarms is reduced by 43% while the percentage of water and energy detections correctly classified is 99%.
|
369 |
Probabilistic Graphical Models and Algorithms forJiao, Feng January 2008 (has links)
In this thesis I present research in two fields: machine learning and computational biology.
First, I develop new machine learning methods for graphical models that can be applied to protein problems. Then I apply graphical model algorithms to protein problems, obtaining improvements in protein structure prediction and protein structure alignment. First,in the machine learning work, I focus on a special kind of graphical model---conditional random fields (CRFs). Here, I present a new semi-supervised training procedure for CRFs that can be used to train sequence segmentors and labellers from a combination of labeled and unlabeled training data. Such learning algorithms can be applied to protein and gene name entity recognition problems. This work provides one of the first semi-supervised discriminative training methods for structured classification.
Second, in my computational biology work, I focus mainly on protein problems. In particular, I first propose a tree decomposition method for solving the protein structure prediction and protein structure alignment problems. In so doing, I reveal why tree decomposition is a good method for many protein problems. Then, I propose a computational framework for detection of similar structures of a target protein with sparse NMR data, which can help to predict protein structure using experimental data.
Finally, I propose a new machine learning approach---LS_Boost---to solve the protein fold recognition problem, which is one of the key steps in protein structure prediction. After a thorough comparison, the algorithm is proved to be both more accurate and more efficient than traditional z-Score method and other machine learning methods.
|
370 |
Supervised Methods for Fault Detection in VehicleXiang, Gao, Nan, Jiang January 2010 (has links)
Uptime and maintenance planning are important issues for vehicle operators (e.g.operators of bus fleets). Unplanned downtime can cause a bus operator to be fined if the vehicle is not on time. Supervised classification methods for detecting faults in vehicles are compared in this thesis. Data has been collected by a vehicle manufacturer including three kinds of faulty states in vehicles (i.e. charge air cooler leakage, radiator and air filter clogging). The problem consists of differentiating between the normal data and the three different categories of faulty data. Evaluated methods include linear model, neural networks model, 1-nearest neighbor and random forest model. For every kind of model, a variable selection method should be used. In our thesis we try to find the best model for this problem, and also select the most important input signals. After we compare these four models, we found that the best accuracy (96.9% correct classifications) was achieved with the random forest model.
|
Page generated in 0.0516 seconds