Global ETD Search

511	HaMMLeT: An Infinite Hidden Markov Model with Local Transitions Dawson, Colin Reimer, Dawson, Colin Reimer January 2017 (has links) In classical mixture modeling, each data point is modeled as arising i.i.d. (typically) from a weighted sum of probability distributions. When data arises from different sources that may not give rise to the same mixture distribution, a hierarchical model can allow the source contexts (e.g., documents, sub-populations) to share components while assigning different weights across them (while perhaps coupling the weights to "borrow strength" across contexts). The Dirichlet Process (DP) Mixture Model (e.g., Rasmussen (2000)) is a Bayesian approach to mixture modeling which models the data as arising from a countably infinite number of components: the Dirichlet Process provides a prior on the mixture weights that guards against overfitting. The Hierarchical Dirichlet Process (HDP) Mixture Model (Teh et al., 2006) employs a separate DP Mixture Model for each context, but couples the weights across contexts. This coupling is critical to ensure that mixture components are reused across contexts. An important application of HDPs is to time series models, in particular Hidden Markov Models (HMMs), where the HDP can be used as a prior on a doubly infinite transition matrix for the latent Markov chain, giving rise to the HDP-HMM (first developed, as the "Infinite HMM", by Beal et al. (2001), and subsequently shown to be a case of an HDP by Teh et al. (2006)). There, the hierarchy is over rows of the transition matrix, and the distributions across rows are coupled through a top-level Dirichlet Process. In the first part of the dissertation, I present a formal overview of Mixture Models and Hidden Markov Models. I then turn to a discussion of Dirichlet Processes and their various representations, as well as associated schemes for tackling the problem of doing approximate inference over an infinitely flexible model with finite computa- tional resources. I will then turn to the Hierarchical Dirichlet Process (HDP) and its application to an infinite state Hidden Markov Model, the HDP-HMM. These models have been widely adopted in Bayesian statistics and machine learning. However, a limitation of the vanilla HDP is that it offers no mechanism to model correlations between mixture components across contexts. This is limiting in many applications, including topic modeling, where we expect certain components to occur or not occur together. In the HMM setting, we might expect certain states to exhibit similar incoming and outgoing transition probabilities; that is, for certain rows and columns of the transition matrix to be correlated. In particular, we might expect pairs of states that are "similar" in some way to transition frequently to each other. The HDP-HMM offers no mechanism to model this similarity structure. The central contribution of the dissertation is a novel generalization of the HDP- HMM which I call the Hierarchical Dirichlet Process Hidden Markov Model With Local Transitions (HDP-HMM-LT, or HaMMLeT for short), which allows for correlations between rows and columns of the transition matrix by assigning each state a location in a latent similarity space and promoting transitions between states that are near each other. I present a Gibbs sampling scheme for inference in this model, employing auxiliary variables to simplify the relevant conditional distributions, which have a natural interpretation after re-casting the discrete time Markov chain as a continuous time Markov Jump Process where holding times are integrated out, and where some jump attempts "fail". I refer to this novel representation as the Markov Process With Failed Jumps. I test this model on several synthetic and real data sets, showing that for data where transitions between similar states are more common, the HaMMLeT model more effectively finds the latent time series structure underlying the observations. Bayesian statistics Machine learning Time series modeling
512	Assisting bug report triage through recommendation Anvik, John 05 1900 (has links) A key collaborative hub for many software development projects is the issue tracking system, or bug repository. The use of a bug repository can improve the software development process in a number of ways including allowing developers who are geographically distributed to communicate about project development. However, reports added to the repository need to be triaged by a human, called the triager, to determine if reports are meaningful. If a report is meaningful, the triager decides how to organize the report for integration into the project's development process. We call triager decisions with the goal of determining if a report is meaningful, repository-oriented decisions, and triager decisions that organize reports for the development process, development-oriented decisions. Triagers can become overwhelmed by the number of reports added to the repository. Time spent triaging also typically diverts valuable resources away from the improvement of the product to the managing of the development process. To assist triagers, this dissertation presents a machine learning approach to create recommenders that assist with a variety of development-oriented decisions. In this way, we strive to reduce human involvement in triage by moving the triager's role from having to gather information to make a decision to that of confirming a suggestion. This dissertation introduces a triage-assisting recommender creation process that can create a variety of different development-oriented decision recommenders for a range of projects. The recommenders created with this approach are accurate: recommenders for which developer to assign a report have a precision of 70% to 98% over five open source projects, recommenders for which product component the report is for have a recall of 72% to 92%, and recommenders for who to add to the cc: list of a report that have a recall of 46% to 72%. We have evaluated recommenders created with our triage-assisting recommender creation process using both an analytic evaluation and a field study. In addition, we present in this dissertation an approach to assist project members to specify the project-specific values for the triage-assisting recommender creation process, and show that such recommenders can be created with a subset of the repository data. / Science, Faculty of / Computer Science, Department of / Graduate bug report triage machine learning recommender
513	Design of a self-paced brain computer interface system using features extracted from three neurological phenomena Fatourechi, Mehrdad 05 1900 (has links) Self-paced Brain computer interface (SBCI) systems allow individuals with motor disabilities to use their brain signals to control devices, whenever they wish. These systems are required to identify the user’s “intentional control (IC)” commands and they must remain inactive during all periods in which users do not intend control (called “no control (NC)” periods). This dissertation addresses three issues related to the design of SBCI systems: 1) their presently high false positive (FP) rates, 2) the presence of artifacts and 3) the identification of a suitable evaluation metric. To improve the performance of SBCI systems, the following are proposed: 1) a method for the automatic user-customization of a 2-state SBCI system, 2) a two-stage feature reduction method for selecting wavelet coefficients extracted from movement-related potentials (MRP), 3) an SBCI system that classifies features extracted from three neurological phenomena: MRPs, changes in the power of the Mu and Beta rhythms; 4) a novel method that effectively combines methods developed in 2) and 3 ) and 5) generalizing the system developed in 3) for detecting a right index finger flexion to detecting the right hand extension. Results of these studies using actual movements show an average true positive (TP) rate of 56.2% at the FP rate of 0.14% for the finger flexion study and an average TP rate of 33.4% at the FP rate of 0.12% for the hand extension study. These FP results are significantly lower than those achieved in other SBCI systems, where FP rates vary between 1-10%. We also conduct a comprehensive survey of the BCI literature. We demonstrate that many BCI papers do not properly deal with artifacts. We show that the proposed BCI achieves a good performance of TP=51.8% and FP=0.4% in the presence of eye movement artifacts. Further tests of the performance of the proposed system in a pseudo-online environment, shows an average TP rate =48.8% at the FP rate of 0.8%. Finally, we propose a framework for choosing a suitable evaluation metric for SBCI systems. This framework shows that Kappa coefficient is more suitable than other metrics in evaluating the performance during the model selection procedure. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate brain computer interface pattern recognition machine learning
514	Data analysis in proteomics novel computational strategies for modeling and interpreting complex mass spectrometry data Sniatynski, Matthew John 11 1900 (has links) Contemporary proteomics studies require computational approaches to deal with both the complexity of the data generated, and with the volume of data produced. The amalgamation of mass spectrometry -- the analytical tool of choice in proteomics -- with the computational and statistical sciences is still recent, and several avenues of exploratory data analysis and statistical methodology remain relatively unexplored. The current study focuses on three broad analytical domains, and develops novel exploratory approaches and practical tools in each. Data transform approaches are the first explored. These methods re-frame data, allowing for the visualization and exploitation of features and trends that are not immediately evident. An exploratory approach making use of the correlation transform is developed, and is used to identify mass-shift signals in mass spectra. This approach is used to identify and map post-translational modifications on individual peptides, and to identify SILAC modification-containing spectra in a full-scale proteomic analysis. Secondly, matrix decomposition and projection approaches are explored; these use an eigen-decomposition to extract general trends from groups of related spectra. A data visualization approach is demonstrated using these techniques, capable of visualizing trends in large numbers of complex spectra, and a data compression and feature extraction technique is developed suitable for use in spectral modeling. Finally, a general machine learning approach is developed based on conditional random fields (CRFs). These models are capable of dealing with arbitrary sequence modeling tasks, similar to hidden Markov models (HMMs), but are far more robust to interdependent observational features, and do not require limiting independence assumptions to remain tractable. The theory behind this approach is developed, and a simple machine learning fragmentation model is developed to test the hypothesis that reproducible sequence-specific intensity ratios are present within the distribution of fragment ions originating from a common peptide bond breakage. After training, the model shows very good performance associating peptide sequences and fragment ion intensity information, lending strong support to the hypothesis. / Medicine, Faculty of / Medicine, Department of / Experimental Medicine, Division of / Graduate Proteomics Bioinformatics Machine learning Mass spectrometry
515	Computational approaches to predicting drug induced toxicity Marchese Robinson, Richard Liam January 2013 (has links) Novel approaches and models for predicting drug induced toxicity in silico are presented. Typically, these were based on Quantitative Structure-Activity Relationships (QSAR). The following endpoints were modelled: mutagenicity, carcinogenicity, inhibition of the hERG ion channel and the associated arrhythmia - Torsades de Pointes. A consensus model was developed based on Derek for WindowsTM and Toxtree and used to filter compounds as part of a collaborative effort resulting in the identification of potential starting points for anti-tuberculosis drugs. Based on the careful selection of data from the literature, binary classifiers were generated for the identification of potent hERG inhibitors. These were found to perform competitively with, or better than, those computational approaches previously presented in the literature. Some of these models were generated using Winnow, in conjunction with a novel proposal for encoding molecular structures as required by this algorithm. The Winnow models were found to perform comparably to models generated using the Support Vector Machine and Random Forest algorithms. These studies also emphasised the variability in results which may be obtained when applying the same approaches to different train/test combinations. Novel approaches to combining chemical information with Ultrafast Shape Recognition (USR) descriptors are introduced: Atom Type USR (ATUSR) and a combination between a proposed Atom Type Fingerprint (ATFP) and USR (USR-ATFP). These were applied to the task of predicting protein-ligand interactions - including the prediction of hERG inhibition. Whilst, for some of the datasets considered, either ATUSR or USR-ATFP was found to perform marginally better than all other descriptor sets to which they were compared, most differences were statistically insignificant. Further work is warranted to determine the advantages which ATUSR and USR-ATFP might offer with respect to established descriptor sets. The first attempts to construct QSAR models for Torsades de Pointes using predicted cardiac ion channel inhibitory potencies as descriptors are presented, along with the first evaluation of experimentally determined inhibitory potencies as an alternative, or complement to, standard descriptors. No (clear) evidence was found that 'predicted' ('experimental') 'IC-descriptors' improve performance. However, their value may lie in the greater interpretability they could confer upon the models. Building upon the work presented in the preceding chapters, this thesis ends with specific proposals for future research directions. 540
516	Bayesian methods for gravitational waves and neural networks Graff, Philip B. January 2012 (has links) Einstein’s general theory of relativity has withstood 100 years of testing and will soon be facing one of its toughest challenges. In a few years we expect to be entering the era of the first direct observations of gravitational waves. These are tiny perturbations of space-time that are generated by accelerating matter and affect the measured distances between two points. Observations of these using the laser interferometers, which are the most sensitive length-measuring devices in the world, will allow us to test models of interactions in the strong field regime of gravity and eventually general relativity itself. I apply the tools of Bayesian inference for the examination of gravitational wave data from the LIGO and Virgo detectors. This is used for signal detection and estimation of the source parameters. I quantify the ability of a network of ground-based detectors to localise a source position on the sky for electromagnetic follow-up. Bayesian criteria are also applied to separating real signals from glitches in the detectors. These same tools and lessons can also be applied to the type of data expected from planned space-based detectors. Using simulations from the Mock LISA Data Challenges, I analyse our ability to detect and characterise both burst and continuous signals. The two seemingly different signal types will be overlapping and confused with one another for a space-based detector; my analysis shows that we will be able to separate and identify many signals present. Data sets and astrophysical models are continuously increasing in complexity. This will create an additional computational burden for performing Bayesian inference and other types of data analysis. I investigate the application of the MOPED algorithm for faster parameter estimation and data compression. I find that its shortcomings make it a less favourable candidate for further implementation. The framework of an artificial neural network is a simple model for the structure of a brain which can “learn” functional relationships between sets of inputs and outputs. I describe an algorithm developed for the training of feed-forward networks on pre-calculated data sets. The trained networks can then be used for fast prediction of outputs for new sets of inputs. After demonstrating capabilities on toy data sets, I apply the ability of the network to classifying handwritten digits from the MNIST database and measuring ellipticities of galaxies in the Mapping Dark Matter challenge. The power of neural networks for learning and rapid prediction is also useful in Bayesian inference where the likelihood function is computationally expensive. The new BAMBI algorithm is detailed, in which our network training algorithm is combined with the nested sampling algorithm MULTINEST to provide rapid Bayesian inference. Using samples from the normal inference, a network is trained on the likelihood function and eventually used in its place. This is able to provide significant increase in the speed of Bayesian inference while returning identical results. The trained networks can then be used for extremely rapid follow-up analyses with different priors, obtaining orders of magnitude of speed increase. Learning how to apply the tools of Bayesian inference for the optimal recovery of gravitational wave signals will provide the most scientific information when the first detections are made. Complementary to this, the improvement of our analysis algorithms to provide the best results in less time will make analysis of larger and more complicated models and data sets practical. 530
517	A Location-Aware Social Media Monitoring System Ji, Liu January 2014 (has links) Social media users generate a large volume of data, which can contain meaningful and useful information. One such example is information about locations, which may be useful in applications such as marketing and security monitoring. There are two types of locations: location entities mentioned in the text of the messages and the physical locations of users. Extracting the first type of locations is not trivial because the location entities in the text are often ambiguous. In this thesis, we implement a sequential classification model with conditional random fields followed by a rule-based disambiguation model, we apply them to Twitter messages (tweets) and we show that they handle the ambiguous location entities in our dataset reasonably well. Only very few users disclose their physical locations; in order to automatically detect their locations, many approaches have been proposed using various types of information, including the tweets posted by the users. It is not easy to infer the original locations from text data, because text tends to be noisy, particularly in social media. Recently, deep learning techniques have been shown to reduce the error rate of many machine learning tasks, due to their ability to learn meaningful representations of input data. We investigate the potential of building a deep-learning architecture to infer the location of Twitter users based merely on their tweets. We find that stacked denoising auto-encoders are well suited for this task, with results comparable to state-of-the-art models. Finally, we combine the two models above with a third-party sentiment analysis tool and obtain a intelligent social media monitoring system. We show a demo of the system and that it is able to predict and visualize the locations and sentiments contained in a stream of tweets related to mobile phone brands - a typical real world e-business application. Natural Language Processing Machine Learning Social Media
518	Learning the Sub-Conceptual Layer: A Framework for One-Class Classification Sharma, Shiven January 2016 (has links) In the realm of machine learning research and application, binary classification algorithms, i.e. algorithms that attempt to induce discriminant functions between two categories of data, reign supreme. Their fundamental property is the reliance on the availability of data from all known categories in order to induce functions that can offer acceptable levels of accuracy. Unfortunately, data from so-called ``real-world'' domains sometimes do not satisfy this property. In order to tackle this, researchers focus on methods such as sampling and cost-sensitive classification to make the data more conducive for binary classifiers. However, as this thesis shall argue, there are scenarios in which even such explicit methods to rectify distributions fail. In such cases, one-class classification algorithms become a practical alternative. Unfortunately, if the domain is inherently complex, the advantage that they offer over binary classifiers becomes diminished. The work in this thesis addresses this issue, and builds a framework that allows for one-class algorithms to build efficient classifiers. In particular, this thesis introduces the notion of learning along the lines sub-concepts in the domain; the complexity in domains arises due to the presence of sub-concepts, and by learning over them explicitly rather than on the entire domain as a whole, we can produce powerful one-class classification systems. The level of knowledge regarding these sub-concepts will naturally vary by domain, and thus we develop three distinct frameworks that take the amount of domain knowledge available into account. We demonstrate these frameworks over three real-world domains. The first domain we consider is that of biometric authentication via a users swipe on a smartphone. We identify sub-concepts based on a users motion, and given that modern smartphones employ sensors that can identify motion, during learning as well as application, sub-concepts can be identified explicitly, and novel instances can be processed by the appropriate one-class classifier. The second domain is that of invasive isotope detection via gamma-ray spectra. The sub-concepts are based on environmental factors; however, the hardware employed cannot detect such concepts, and quantifying the precise source that creates these sub-concepts is difficult to ascertain. To remedy this, we introduce a novel framework in which we employ a sub-concept detector by means of a multi-class classifier, which pre-processes novel instances in order to send them to the correct one-class classifier. The third domain is that of compliance verification of the Comprehensive Test Ban Treaty (CTBT) through Xenon isotope measurements. This domain presents the worst case where sub-concepts are not known. To this end, we employ a generic version of our framework in which we simply cluster the domain and build classifiers over each cluster. In all cases, we demonstrate that learning in the context of domain concepts greatly improves the performance of one-class classifiers. machine learning one-class classification artificial intelligence
519	Exploring Mediatoil Imagery: A Content-Based Approach Saroop, Sahil January 2016 (has links) The future of Alberta’s bitumen sands, also known as “oil sands” or “tar sands,” and their place in Canada’s energy future has become a topic of much public debate. Within this debate, the print, television, and social media campaigns of those who both support and oppose developing the oil sands are particularly visible. As such, campaigns around the oil sands may be seen as influencing audience perceptions of the benefits and drawbacks of oil sands production. There is consequently a need to study the media materials of various tar sands stakeholders and explore how they differ. In this setting, it is essential to gather documents and identify content within images, which requires the use of an image retrieval technique such as a content-based image retrieval (CBIR) system. In a CBIR system, images are represented by low-level features (i.e. specific structures in the image such as points, edges, or objects), which are used to distinguish pictures from one another. The oil sands domain has to date not been mapped using CBIR systems. The research thus focuses on creating an image retrieval system, namely Mediatoil-IR, for exploring documents related to the oil sands. Our aim is to evaluate various low-level representations of the images within this context. To this end, our experimental framework employs LAB color histogram (LAB) and speeded up robust features (SURF) in order to typify the imagery. We further use machine learning techniques to improve the quality of retrieval (in terms of both accuracy and speed). To achieve this aim, the extracted features from each image are encoded in the form of vectors and used as a training set for learning classification models to organize pictures into different categories. Different algorithms were considered such as Linear SVM, Quadratic SVM, Weighted KNN, Decision Trees, Bagging, and Boosting on trees. It was shown that Quadratic SVM algorithm trained on SURF features is a good approach for building CBIR, and is used in building Mediatoil-IR. Finally, with the help of created CBIR, we were able to extract the similar documents and explore the different types of imagery used by different stakeholders. Our experimental evaluation shows that our Mediatoil-IR system is able to accurately explore the imagery used by different stakeholders. Machine Learning Image Processing CBIR Mediatoil
520	Supervised Machine Learning on a Network Scale: Application to Seismic Event Detection and Classification Reynen, Andrew January 2017 (has links) A new method using a machine learning technique is applied to event classification and detection at seismic networks. This method is applicable to a variety of network sizes and settings. The algorithm makes use of a small catalogue of known observations across the entire network. Two attributes, the polarization and frequency content, are used as input to regression. These attributes are extracted at predicted arrival times for P and S waves using only an approximate velocity model, as attributes are calculated over large time spans. This method of waveform characterization is shown to be able to distinguish between blasts and earthquakes with 99 percent accuracy using a network of 13 stations located in Southern California. The combination of machine learning with generalized waveform features is further applied to event detection in Oklahoma, United States. The event detection algorithm makes use of a pair of unique seismic phases to locate events, with a precision directly related to the sampling rate of the generalized waveform features. Over a week of data from 30 stations in Oklahoma, United States are used to automatically detect 25 times more events than the catalogue of the local geological survey, with a false detection rate of less than 2 per cent. This method provides a highly confident way of detecting and locating events. Furthermore, a large number of seismic events can be automatically detected with low false alarm, allowing for a larger automatic event catalogue with a high degree of trust. Machine Learning Seismology Data mining Earthquakes

Search results