Global ETD Search

481	Scaling up support vector machines / Tsang, Wai-Hung. January 2007 (has links) Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2007. / Includes bibliographical references (leaves 89-96). Also available in electronic version.
482	Knowledge transfer techniques for dynamic environments Rajan, Suju, January 1900 (has links) (PDF) Thesis (Ph. D.)--University of Texas at Austin, 2006. / Vita. Includes bibliographical references.
483	Conjunctive and disjunctive version spaces with instance-based boundary sets Smirnov, Evgueni Nikolaevich. January 1900 (has links) Proefschrift Universiteit Maastricht. / Met lit. opg. - Met samenvatting in het Nederlands.
484	Adaptive game AI Spronck, Pieter Hubert Marie. January 2005 (has links) Proefschrift Universiteit Maastricht. / Met index, lit. opg. - Met samenvatting in het Nederlands.
485	Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides Jensen, Kyle, Styczynski, Mark, Stephanopoulos, Gregory 01 1900 (has links) Peptide and protein sequences are most commonly represented as a strings: a series of letters selected from the twenty character alphabet of abbreviations for the naturally occurring amino acids. Here, we experiment with representations of small peptide sequences that incorporate more physiochemical information. Specifically, we develop three different physiochemical representations for a set of roughly 700 HIV–I protease substrates. These different representations are used as input to an array of six different machine learning models which are used to predict whether or not a given peptide is likely to be an acceptable substrate for the protease. Our results show that, in general, higher–dimensional physiochemical representations tend to have better performance than representations incorporating fewer dimensions selected on the basis of high information content. We contend that such representations are more biologically relevant than simple string–based representations and are likely to more accurately capture peptide characteristics that are functionally important. / Singapore-MIT Alliance (SMA) Machine learning peptides modeling physio-chemical properties
486	Delving deep into fetal neurosonography : an image analysis approach Huang, Ruobing January 2017 (has links) Ultrasound screening has been used for decades as the main modality to examine fetal brain development and to diagnose possible anomalies. However, basic clinical ultrasound examination of the fetal head is limited to axial planes of the brain and linear measurements which may have restrained its potential and efficacy. The recent introduction of three-dimensional (3D) ultrasound provides the opportunity to navigate to different anatomical planes and to evaluate structures in 3D within the developing brain. Regardless of acquisition methods, interpreting 2D/3D ultrasound fetal brain images require considerable skill and time. In this thesis, a series of automatic image analysis algorithms are proposed that exploit the rich sonographic patterns captured by the scans and help to simplify clinical examination. The original contributions include: 1. An original skull detection method for 3D ultrasound images, which achieves mean accuracy of 2.2 ± 1.6 mm compared to the ground truth (GT). In addition, the algorithm is utilised for accurate automated measurement of essential biometry in standard examinations: biparietal diameter (mean accuracy: 2.1 ± 1.4 mm) and head circumference (mean accuracy: 4.5 ± 3.7 mm). 2. A plane detection algorithm. It automatically extracts mid-sagittal plane that provides visualization of midline structures, which are crucial to assess central nervous system malformations. The automated planes are in accordance with manual ones (within 3.0 ± 3.5°). 3. A general segmentation framework for delineating fetal brain structures in 2D images. The automatically generated predictions are found to be agreed with the manual delineations (mean dice-similarity coefficient: 0.79 ± 0.07). As a by-product, the algorithm generated automated biometry. The results might be further utilized for morphological evaluation in future research. 4. An efficient localization model that is able to pinpoint the 3D locations of five key brain structures that are examined in a routine clinical examination. The predictions correlate with the ground truth: the average centre deviation is 1.8 ± 1.4 mm, and the size difference between them is 1.9 ± 1.5 mm. The application of this model may greatly reduce the time required for routine examination in clinical practice. 5. A 3D affine registration pipeline. Leveraging the power of convolutional neural networks, the model takes raw 3D brain images as input and geometrically transforms fetal brains into a unified coordinate system (proposed as a Fetal Brain Talairach system). The integration of these algorithms into computer-assisted analysis tools may greatly reduce the time and effort to evaluate 3D fetal neurosonography for clinicians. Furthermore, they will assist understanding of fetal brain maturation by distilling 2D/3D information directly from the uterus.
487	Local learning by partitioning Wang, Joseph 12 March 2016 (has links) In many machine learning applications data is assumed to be locally simple, where examples near each other have similar characteristics such as class labels or regression responses. Our goal is to exploit this assumption to construct locally simple yet globally complex systems that improve performance or reduce the cost of common machine learning tasks. To this end, we address three main problems: discovering and separating local non-linear structure in high-dimensional data, learning low-complexity local systems to improve performance of risk-based learning tasks, and exploiting local similarity to reduce the test-time cost of learning algorithms. First, we develop a structure-based similarity metric, where low-dimensional non-linear structure is captured by solving a non-linear, low-rank representation problem. We show that this problem can be kernelized, has a closed-form solution, naturally separates independent manifolds, and is robust to noise. Experimental results indicate that incorporating this structural similarity in well-studied problems such as clustering, anomaly detection, and classification improves performance. Next, we address the problem of local learning, where a partitioning function divides the feature space into regions where independent functions are applied. We focus on the problem of local linear classification using linear partitioning and local decision functions. Under an alternating minimization scheme, learning the partitioning functions can be reduced to solving a weighted supervised learning problem. We then present a novel reformulation that yields a globally convex surrogate, allowing for efficient, joint training of the partitioning functions and local classifiers. We then examine the problem of learning under test-time budgets, where acquiring sensors (features) for each example during test-time has a cost. Our goal is to partition the space into regions, with only a small subset of sensors needed in each region, reducing the average number of sensors required per example. Starting with a cascade structure and expanding to binary trees, we formulate this problem as an empirical risk minimization and construct an upper-bounding surrogate that allows for sequential decision functions to be trained jointly by solving a linear program. Finally, we present preliminary work extending the notion of test-time budgets to the problem of adaptive privacy. Electrical engineering Machine learning Pattern recognition
488	Crystallization properties of molecular materials : prediction and rule extraction by machine learning Wicker, Jerome January 2017 (has links) Crystallization is an increasingly important process in a variety of applications from drug development to single crystal X-ray diffraction structure determination. However, while there is a good deal of research into prediction of molecular crystal structure, the factors that cause a molecule to be crystallizable have so far remained poorly understood. The aim of this project was to answer the seemingly straightforward question: can we predict how easily a molecule will crystallize? The Cambridge Structural Database contains almost a million examples of materials from the scientific literature that have crystallized. Models for the prediction of crystallization propensity of organic molecular materials were developed by training machine learning algorithms on carefully curated sets of molecules which are either observed or not observed to crystallize, extracted from a database of commercially available molecules. The models were validated computationally and experimentally, while feature extraction methods and high resolution powder diffraction studies were used to understand the molecular and structural features that determine the ease of crystallization. This led to the development of a new molecular descriptor which encodes information about the conformational flexibility of a molecule. The best models gave error rates of less than 5% for both cross-validation data and previously-unseen test data, demonstrating that crystallization propensity can be predicted with a high degree of accuracy. Molecular size, flexibility and nitrogen atom environments were found to be the most influential factors in determining the ease of crystallization, while microstructural features determined by powder diffraction showed almost no correlation with the model predictions. Further predictions on co-crystals show scope for extending the methodology to other relevant applications.
489	Bayesian matrix factorisation : inference, priors, and data integration Brouwer, Thomas Alexander January 2017 (has links) In recent years the amount of biological data has increased exponentially. Most of these data can be represented as matrices relating two different entity types, such as drug-target interactions (relating drugs to protein targets), gene expression profiles (relating drugs or cell lines to genes), and drug sensitivity values (relating drugs to cell lines). Not only the size of these datasets is increasing, but also the number of different entity types that they relate. Furthermore, not all values in these datasets are typically observed, and some are very sparse. Matrix factorisation is a popular group of methods that can be used to analyse these matrices. The idea is that each matrix can be decomposed into two or more smaller matrices, such that their product approximates the original one. This factorisation of the data reveals patterns in the matrix, and gives us a lower-dimensional representation. Not only can we use this technique to identify clusters and other biological signals, we can also predict the unobserved entries, allowing us to prune biological experiments. In this thesis we introduce and explore several Bayesian matrix factorisation models, focusing on how to best use them for predicting these missing values in biological datasets. Our main hypothesis is that matrix factorisation methods, and in particular Bayesian variants, are an extremely powerful paradigm for predicting values in biological datasets, as well as other applications, and especially for sparse and noisy data. We demonstrate the competitiveness of these approaches compared to other state-of-the-art methods, and explore the conditions under which they perform the best. We consider several aspects of the Bayesian approach to matrix factorisation. Firstly, the effect of inference approaches that are used to find the factorisation on predictive performance. Secondly, we identify different likelihood and Bayesian prior choices that we can use for these models, and explore when they are most appropriate. Finally, we introduce a Bayesian matrix factorisation model that can be used to integrate multiple biological datasets, and hence improve predictions. This model hybridly combines different matrix factorisation models and Bayesian priors. Through these models and experiments we support our hypothesis and provide novel insights into the best ways to use Bayesian matrix factorisation methods for predictive purposes.
490	Learning natural coding conventions Allamanis, Miltiadis January 2017 (has links) Coding conventions are ubiquitous in software engineering practice. Maintaining a uniform coding style allows software development teams to communicate through code by making the code clear and, thus, readable and maintainable—two important properties of good code since developers spend the majority of their time maintaining software systems. This dissertation introduces a set of probabilistic machine learning models of source code that learn coding conventions directly from source code written in a mostly conventional style. This alleviates the coding convention enforcement problem, where conventions need to first be formulated clearly into unambiguous rules and then be coded in order to be enforced; a tedious and costly process. First, we introduce the problem of inferring a variable’s name given its usage context and address this problem by creating Naturalize — a machine learning framework that learns to suggest conventional variable names. Two machine learning models, a simple n-gram language model and a specialized neural log-bilinear context model are trained to understand the role and function of each variable and suggest new stylistically consistent variable names. The neural log-bilinear model can even suggest previously unseen names by composing them from subtokens (i.e. sub-components of code identifiers). The suggestions of the models achieve 90% accuracy when suggesting variable names at the top 20% most confident locations, rendering the suggestion system usable in practice. We then turn our attention to the significantly harder method naming problem. Learning to name methods, by looking only at the code tokens within their body, requires a good understating of the semantics of the code contained in a single method. To achieve this, we introduce a novel neural convolutional attention network that learns to generate the name of a method by sequentially predicting its subtokens. This is achieved by focusing on different parts of the code and potentially directly using body (sub)tokens even when they have never been seen before. This model achieves an F1 score of 51% on the top five suggestions when naming methods of real-world open-source projects. Learning about naming code conventions uses the syntactic structure of the code to infer names that implicitly relate to code semantics. However, syntactic similarities and differences obscure code semantics. Therefore, to capture features of semantic operations with machine learning, we need methods that learn semantic continuous logical representations. To achieve this ambitious goal, we focus our investigation on logic and algebraic symbolic expressions and design a neural equivalence network architecture that learns semantic vector representations of expressions in a syntax-driven way, while solely retaining semantics. We show that equivalence networks learn significantly better semantic vector representations compared to other, existing, neural network architectures. Finally, we present an unsupervised machine learning model for mining syntactic and semantic code idioms. Code idioms are conventional “mental chunks” of code that serve a single semantic purpose and are commonly used by practitioners. To achieve this, we employ Bayesian nonparametric inference on tree substitution grammars. We present a wide range of evidence that the resulting syntactic idioms are meaningful, demonstrating that they do indeed recur across software projects and that they occur more frequently in illustrative code examples collected from a Q&A site. These syntactic idioms can be used as a form of automatic documentation of coding practices of a programming language or an API. We also mine semantic loop idioms, i.e. highly abstracted but semantic-preserving idioms of loop operations. We show that semantic idioms provide data-driven guidance during the creation of software engineering tools by mining common semantic patterns, such as candidate refactoring locations. This gives data-based evidence to tool, API and language designers about general, domain and project-specific coding patterns, who instead of relying solely on their intuition, can use semantic idioms to achieve greater coverage of their tool or new API or language feature. We demonstrate this by creating a tool that suggests loop refactorings into functional constructs in LINQ. Semantic loop idioms also provide data-driven evidence for introducing new APIs or programming language features.

Search results